Automatic import for version 2.6.32.11

This commit is contained in:
Rosa 2012-04-14 04:10:33 +04:00
commit 82649ddef6
116 changed files with 359957 additions and 0 deletions

2
.abf.yml Normal file
View file

@ -0,0 +1,2 @@
sources:
"linux-2.6.32.11.tar.bz2": 67b40af11576077ac0443b24483b66e123ea542b

363
SoN-23-mm-swapfile.patch Normal file
View file

@ -0,0 +1,363 @@
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: [PATCH 23/31] mm: add support for non block device backed swap files
New addres_space_operations methods are added:
int swapon(struct file *);
int swapoff(struct file *);
int swap_out(struct file *, struct page *, struct writeback_control *);
int swap_in(struct file *, struct page *);
When during sys_swapon() the ->swapon() method is found and returns no error
the swapper_space.a_ops will proxy to sis->swap_file->f_mapping->a_ops, and
make use of ->swap_{out,in}() to write/read swapcache pages.
The ->swapon() method will be used to communicate to the file that the VM
relies on it, and the address_space should take adequate measures (like
reserving memory for mempools or the like). The ->swapoff() method will be
called on sys_swapoff() when ->swapon() was found and returned no error.
This new interface can be used to obviate the need for ->bmap in the swapfile
code. A filesystem would need to load (and maybe even allocate) the full block
map for a file into memory and pin it there on ->swapon() so that
->swap_{out,in}() have instant access to it. It can be released on ->swapoff().
The reason to provide ->swap_{out,in}() over using {write,read}page() is to
1) make a distinction between swapcache and pagecache pages, and
2) to provide a struct file * for credential context (normally not needed
in the context of writepage, as the page content is normally dirtied
using either of the following interfaces:
write_{begin,end}()
{prepare,commit}_write()
page_mkwrite()
which do have the file context.
[miklos@szeredi.hu: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
Documentation/filesystems/Locking | 22 ++++++++++++++++
Documentation/filesystems/vfs.txt | 18 +++++++++++++
include/linux/buffer_head.h | 1
include/linux/fs.h | 9 ++++++
include/linux/swap.h | 4 ++
mm/page_io.c | 52 ++++++++++++++++++++++++++++++++++++++
mm/swap_state.c | 4 +-
mm/swapfile.c | 30 ++++++++++++++++++++-
8 files changed, 136 insertions(+), 4 deletions(-)
Index: linux-2.6.32-master/Documentation/filesystems/Locking
===================================================================
--- linux-2.6.32-master.orig/Documentation/filesystems/Locking
+++ linux-2.6.32-master/Documentation/filesystems/Locking
@@ -174,6 +174,10 @@ prototypes:
int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
loff_t offset, unsigned long nr_segs);
int (*launder_page) (struct page *);
+ int (*swapon) (struct file *);
+ int (*swapoff) (struct file *);
+ int (*swap_out) (struct file *, struct page *, struct writeback_control *);
+ int (*swap_in) (struct file *, struct page *);
locking rules:
All except set_page_dirty may block
@@ -193,6 +197,10 @@ invalidatepage: no yes
releasepage: no yes
direct_IO: no
launder_page: no yes
+swapon no
+swapoff no
+swap_out no yes, unlocks
+swap_in no yes, unlocks
->write_begin(), ->write_end(), ->sync_page() and ->readpage()
may be called from the request handler (/dev/loop).
@@ -292,6 +300,20 @@ cleaned, or an error value if not. Note
getting mapped back in and redirtied, it needs to be kept locked
across the entire operation.
+ ->swapon() will be called with a non-zero argument on files backing
+(non block device backed) swapfiles. A return value of zero indicates success,
+in which case this file can be used for backing swapspace. The swapspace
+operations will be proxied to the address space operations.
+
+ ->swapoff() will be called in the sys_swapoff() path when ->swapon()
+returned success.
+
+ ->swap_out() when swapon() returned success, this method is used to
+write the swap page.
+
+ ->swap_in() when swapon() returned success, this method is used to
+read the swap page.
+
Note: currently almost all instances of address_space methods are
using BKL for internal serialization and that's one of the worst sources
of contention. Normally they are calling library functions (in fs/buffer.c)
Index: linux-2.6.32-master/Documentation/filesystems/vfs.txt
===================================================================
--- linux-2.6.32-master.orig/Documentation/filesystems/vfs.txt
+++ linux-2.6.32-master/Documentation/filesystems/vfs.txt
@@ -537,6 +537,11 @@ struct address_space_operations {
int (*migratepage) (struct page *, struct page *);
int (*launder_page) (struct page *);
int (*error_remove_page) (struct mapping *mapping, struct page *page);
+ int (*swapon)(struct file *);
+ int (*swapoff)(struct file *);
+ int (*swap_out)(struct file *file, struct page *page,
+ struct writeback_control *wbc);
+ int (*swap_in)(struct file *file, struct page *page);
};
writepage: called by the VM to write a dirty page to backing store.
@@ -701,6 +706,19 @@ struct address_space_operations {
unless you have them locked or reference counts increased.
+ swapon: Called when swapon is used on a file. A
+ return value of zero indicates success, in which case this
+ file can be used to back swapspace. The swapspace operations
+ will be proxied to this address space's ->swap_{out,in} methods.
+
+ swapoff: Called during swapoff on files where swapon was successfull.
+
+ swap_out: Called to write a swapcache page to a backing store, similar to
+ writepage.
+
+ swap_in: Called to read a swapcache page from a backing store, similar to
+ readpage.
+
The File Object
===============
Index: linux-2.6.32-master/include/linux/buffer_head.h
===================================================================
--- linux-2.6.32-master.orig/include/linux/buffer_head.h
+++ linux-2.6.32-master/include/linux/buffer_head.h
@@ -339,6 +339,7 @@ static inline int inode_has_buffers(stru
static inline void invalidate_inode_buffers(struct inode *inode) {}
static inline int remove_inode_buffers(struct inode *inode) { return 1; }
static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; }
+static inline void block_sync_page(struct page *) { }
#endif /* CONFIG_BLOCK */
#endif /* _LINUX_BUFFER_HEAD_H */
Index: linux-2.6.32-master/include/linux/fs.h
===================================================================
--- linux-2.6.32-master.orig/include/linux/fs.h
+++ linux-2.6.32-master/include/linux/fs.h
@@ -603,6 +603,15 @@ struct address_space_operations {
int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
unsigned long);
int (*error_remove_page)(struct address_space *, struct page *);
+
+ /*
+ * swapfile support
+ */
+ int (*swapon)(struct file *file);
+ int (*swapoff)(struct file *file);
+ int (*swap_out)(struct file *file, struct page *page,
+ struct writeback_control *wbc);
+ int (*swap_in)(struct file *file, struct page *page);
};
/*
Index: linux-2.6.32-master/include/linux/swap.h
===================================================================
--- linux-2.6.32-master.orig/include/linux/swap.h
+++ linux-2.6.32-master/include/linux/swap.h
@@ -145,6 +145,7 @@ enum {
SWP_DISCARDABLE = (1 << 2), /* blkdev supports discard */
SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */
SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */
+ SWP_FILE = (1 << 5), /* file swap area */
/* add others here before... */
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
@@ -285,6 +286,8 @@ extern void swap_unplug_io_fn(struct bac
/* linux/mm/page_io.c */
extern int swap_readpage(struct page *);
extern int swap_writepage(struct page *page, struct writeback_control *wbc);
+extern void swap_sync_page(struct page *page);
+extern int swap_set_page_dirty(struct page *page);
extern void end_swap_bio_read(struct bio *bio, int err);
/* linux/mm/swap_state.c */
@@ -320,6 +323,7 @@ extern unsigned int count_swap_pages(int
extern sector_t map_swap_page(struct swap_info_struct *, pgoff_t);
extern sector_t swapdev_block(int, pgoff_t);
extern struct swap_info_struct *get_swap_info_struct(unsigned);
+extern struct swap_info_struct *page_swap_info(struct page *);
extern int reuse_swap_page(struct page *);
extern int try_to_free_swap(struct page *);
struct backing_dev_info;
Index: linux-2.6.32-master/mm/page_io.c
===================================================================
--- linux-2.6.32-master.orig/mm/page_io.c
+++ linux-2.6.32-master/mm/page_io.c
@@ -16,6 +16,7 @@
#include <linux/swap.h>
#include <linux/bio.h>
#include <linux/swapops.h>
+#include <linux/buffer_head.h>
#include <linux/writeback.h>
#include <asm/pgtable.h>
@@ -97,11 +98,23 @@ int swap_writepage(struct page *page, st
{
struct bio *bio;
int ret = 0, rw = WRITE;
+ struct swap_info_struct *sis = page_swap_info(page);
if (try_to_free_swap(page)) {
unlock_page(page);
goto out;
}
+
+ if (sis->flags & SWP_FILE) {
+ struct file *swap_file = sis->swap_file;
+ struct address_space *mapping = swap_file->f_mapping;
+
+ ret = mapping->a_ops->swap_out(swap_file, page, wbc);
+ if (!ret)
+ count_vm_event(PSWPOUT);
+ return ret;
+ }
+
bio = get_swap_bio(GFP_NOIO, page_private(page), page,
end_swap_bio_write);
if (bio == NULL) {
@@ -120,13 +133,52 @@ out:
return ret;
}
+void swap_sync_page(struct page *page)
+{
+ struct swap_info_struct *sis = page_swap_info(page);
+
+ if (sis->flags & SWP_FILE) {
+ struct address_space *mapping = sis->swap_file->f_mapping;
+
+ if (mapping->a_ops->sync_page)
+ mapping->a_ops->sync_page(page);
+ } else {
+ block_sync_page(page);
+ }
+}
+
+int swap_set_page_dirty(struct page *page)
+{
+ struct swap_info_struct *sis = page_swap_info(page);
+
+ if (sis->flags & SWP_FILE) {
+ struct address_space *mapping = sis->swap_file->f_mapping;
+
+ return mapping->a_ops->set_page_dirty(page);
+ } else {
+ return __set_page_dirty_nobuffers(page);
+ }
+}
+
int swap_readpage(struct page *page)
{
struct bio *bio;
int ret = 0;
+ struct swap_info_struct *sis = page_swap_info(page);
VM_BUG_ON(!PageLocked(page));
VM_BUG_ON(PageUptodate(page));
+
+ if (sis->flags & SWP_FILE) {
+ struct file *swap_file = sis->swap_file;
+ struct address_space *mapping = swap_file->f_mapping;
+
+ ret = mapping->a_ops->swap_in(swap_file, page);
+ if (!ret)
+ count_vm_event(PSWPIN);
+ return ret;
+ }
+
bio = get_swap_bio(GFP_KERNEL, page_private(page), page,
end_swap_bio_read);
if (bio == NULL) {
Index: linux-2.6.32-master/mm/swap_state.c
===================================================================
--- linux-2.6.32-master.orig/mm/swap_state.c
+++ linux-2.6.32-master/mm/swap_state.c
@@ -28,8 +28,8 @@
*/
static const struct address_space_operations swap_aops = {
.writepage = swap_writepage,
- .sync_page = block_sync_page,
- .set_page_dirty = __set_page_dirty_nobuffers,
+ .sync_page = swap_sync_page,
+ .set_page_dirty = swap_set_page_dirty,
.migratepage = migrate_page,
};
Index: linux-2.6.32-master/mm/swapfile.c
===================================================================
--- linux-2.6.32-master.orig/mm/swapfile.c
+++ linux-2.6.32-master/mm/swapfile.c
@@ -1336,6 +1336,14 @@ static void destroy_swap_extents(struct
list_del(&se->list);
kfree(se);
}
+
+ if (sis->flags & SWP_FILE) {
+ struct file *swap_file = sis->swap_file;
+ struct address_space *mapping = swap_file->f_mapping;
+
+ sis->flags &= ~SWP_FILE;
+ mapping->a_ops->swapoff(swap_file);
+ }
}
/*
@@ -1410,7 +1418,9 @@ add_swap_extent(struct swap_info_struct
*/
static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
{
- struct inode *inode;
+ struct file *swap_file = sis->swap_file;
+ struct address_space *mapping = swap_file->f_mapping;
+ struct inode *inode = mapping->host;
unsigned blocks_per_page;
unsigned long page_no;
unsigned blkbits;
@@ -1421,13 +1431,22 @@ static int setup_swap_extents(struct swa
int nr_extents = 0;
int ret;
- inode = sis->swap_file->f_mapping->host;
if (S_ISBLK(inode->i_mode)) {
ret = add_swap_extent(sis, 0, sis->max, 0);
*span = sis->pages;
goto done;
}
+ if (mapping->a_ops->swapon) {
+ ret = mapping->a_ops->swapon(swap_file);
+ if (!ret) {
+ sis->flags |= SWP_FILE;
+ ret = add_swap_extent(sis, 0, sis->max, 0);
+ *span = sis->pages;
+ }
+ goto done;
+ }
+
blkbits = inode->i_blkbits;
blocks_per_page = PAGE_SIZE >> blkbits;
@@ -2166,6 +2185,13 @@ get_swap_info_struct(unsigned type)
return &swap_info[type];
}
+struct swap_info_struct *page_swap_info(struct page *page)
+{
+ swp_entry_t swap = { .val = page_private(page) };
+ BUG_ON(!PageSwapCache(page));
+ return &swap_info[swp_type(swap)];
+}
+
/*
* swap_lock prevents swap_map being freed. Don't grab an extra
* reference on the swaphandle, it doesn't matter if it becomes unused.

46
add-console-use-vt Normal file
View file

@ -0,0 +1,46 @@
Subject: add console_use_vt
From: kraxel@suse.de
Patch-mainline: no
$subject says all
--- sle11sp1-2010-03-01.orig/drivers/char/tty_io.c 2010-03-01 14:09:07.000000000 +0100
+++ sle11sp1-2010-03-01/drivers/char/tty_io.c 2010-03-01 14:09:43.000000000 +0100
@@ -136,6 +136,8 @@ LIST_HEAD(tty_drivers); /* linked list
DEFINE_MUTEX(tty_mutex);
EXPORT_SYMBOL(tty_mutex);
+int console_use_vt = 1;
+
static ssize_t tty_read(struct file *, char __user *, size_t, loff_t *);
static ssize_t tty_write(struct file *, const char __user *, size_t, loff_t *);
ssize_t redirected_tty_write(struct file *, const char __user *,
@@ -1736,7 +1738,7 @@ retry_open:
goto got_driver;
}
#ifdef CONFIG_VT
- if (device == MKDEV(TTY_MAJOR, 0)) {
+ if (console_use_vt && device == MKDEV(TTY_MAJOR, 0)) {
extern struct tty_driver *console_driver;
driver = tty_driver_kref_get(console_driver);
index = fg_console;
@@ -3138,7 +3140,8 @@ static int __init tty_init(void)
"console");
#ifdef CONFIG_VT
- vty_init(&console_fops);
+ if (console_use_vt)
+ vty_init(&console_fops);
#endif
return 0;
}
--- sle11sp1-2010-03-01.orig/include/linux/console.h 2010-03-01 14:09:07.000000000 +0100
+++ sle11sp1-2010-03-01/include/linux/console.h 2009-10-12 12:16:48.000000000 +0200
@@ -63,6 +63,7 @@ extern const struct consw dummy_con; /*
extern const struct consw vga_con; /* VGA text console */
extern const struct consw newport_con; /* SGI Newport console */
extern const struct consw prom_con; /* SPARC PROM console */
+extern int console_use_vt;
int con_is_bound(const struct consw *csw);
int register_con_driver(const struct consw *csw, int first, int last);

View file

@ -0,0 +1,166 @@
From: Jack Steiner <steiner@sgi.com>
Subject: x86: UV SGI: Don't track GRU space in PAT
References: bnc#561933, fate#306952
Patch-mainline: 2.6.33-rc1
Git-commit: fd12a0d69aee6d90fa9b9890db24368a897f8423
GRU space is always mapped as WB in the page table. There is
no need to track the mappings in the PAT. This also eliminates
the "freeing invalid memtype" messages when the GRU space is unmapped.
Version 2 with changes suggested by Ingo (at least I think I understood what
he wanted).
Version 3 with changes suggested by Peter to make the new function
a member of the x86_platform structure.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Rafael J. Wysocki <rjw@suse.de>
---
arch/x86/include/asm/pat.h | 2 ++
arch/x86/include/asm/x86_init.h | 2 ++
arch/x86/kernel/apic/x2apic_uv_x.c | 19 ++++++++++++++++++-
arch/x86/kernel/x86_init.c | 2 ++
arch/x86/mm/pat.c | 12 +++++++++---
5 files changed, 33 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/pat.h
+++ b/arch/x86/include/asm/pat.h
@@ -24,4 +24,6 @@ int io_reserve_memtype(resource_size_t s
void io_free_memtype(resource_size_t start, resource_size_t end);
+int default_is_untracked_pat_range(u64 start, u64 end);
+
#endif /* _ASM_X86_PAT_H */
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -113,11 +113,13 @@ struct x86_cpuinit_ops {
/**
* struct x86_platform_ops - platform specific runtime functions
+ * @is_untracked_pat_range exclude from PAT logic
* @calibrate_tsc: calibrate TSC
* @get_wallclock: get time from HW clock like RTC etc.
* @set_wallclock: set time back to HW clock
*/
struct x86_platform_ops {
+ int (*is_untracked_pat_range)(u64 start, u64 end);
unsigned long (*calibrate_tsc)(void);
unsigned long (*get_wallclock)(void);
int (*set_wallclock)(unsigned long nowtime);
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -30,10 +30,22 @@
#include <asm/apic.h>
#include <asm/ipi.h>
#include <asm/smp.h>
+#include <asm/x86_init.h>
DEFINE_PER_CPU(int, x2apic_extra_bits);
static enum uv_system_type uv_system_type;
+static u64 gru_start_paddr, gru_end_paddr;
+
+static int is_GRU_range(u64 start, u64 end)
+{
+ return start >= gru_start_paddr && end < gru_end_paddr;
+}
+
+static int uv_is_untracked_pat_range(u64 start, u64 end)
+{
+ return is_ISA_range(start, end) || is_GRU_range(start, end);
+}
static int early_get_nodeid(void)
{
@@ -49,6 +61,7 @@ static int early_get_nodeid(void)
static int __init uv_acpi_madt_oem_check(char *oem_id, char *oem_table_id)
{
if (!strcmp(oem_id, "SGI")) {
+ x86_platform.is_untracked_pat_range = uv_is_untracked_pat_range;
if (!strcmp(oem_table_id, "UVL"))
uv_system_type = UV_LEGACY_APIC;
else if (!strcmp(oem_table_id, "UVX"))
@@ -385,8 +398,12 @@ static __init void map_gru_high(int max_
int shift = UVH_RH_GAM_GRU_OVERLAY_CONFIG_MMR_BASE_SHFT;
gru.v = uv_read_local_mmr(UVH_RH_GAM_GRU_OVERLAY_CONFIG_MMR);
- if (gru.s.enable)
+ if (gru.s.enable) {
map_high("GRU", gru.s.base, shift, shift, max_pnode, map_wb);
+ gru_start_paddr = ((u64)gru.s.base << shift);
+ gru_end_paddr = gru_start_paddr + (1UL << shift) * (max_pnode + 1);
+
+ }
}
static __init void map_mmr_high(int max_pnode)
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -13,6 +13,7 @@
#include <asm/e820.h>
#include <asm/time.h>
#include <asm/irq.h>
+#include <asm/pat.h>
#include <asm/tsc.h>
void __cpuinit x86_init_noop(void) { }
@@ -69,6 +70,7 @@ struct x86_cpuinit_ops x86_cpuinit __cpu
};
struct x86_platform_ops x86_platform = {
+ .is_untracked_pat_range = default_is_untracked_pat_range,
.calibrate_tsc = native_calibrate_tsc,
.get_wallclock = mach_get_cmos_time,
.set_wallclock = mach_set_rtc_mmss,
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -20,6 +20,7 @@
#include <asm/cacheflush.h>
#include <asm/processor.h>
#include <asm/tlbflush.h>
+#include <asm/x86_init.h>
#include <asm/pgtable.h>
#include <asm/fcntl.h>
#include <asm/e820.h>
@@ -348,6 +349,11 @@ static int free_ram_pages_type(u64 start
return 0;
}
+int default_is_untracked_pat_range(u64 start, u64 end)
+{
+ return is_ISA_range(start, end);
+}
+
/*
* req_type typically has one of the:
* - _PAGE_CACHE_WB
@@ -388,7 +394,7 @@ int reserve_memtype(u64 start, u64 end,
}
/* Low ISA region is always mapped WB in page table. No need to track */
- if (is_ISA_range(start, end - 1)) {
+ if (x86_platform.is_untracked_pat_range(start, end - 1)) {
if (new_type)
*new_type = _PAGE_CACHE_WB;
return 0;
@@ -499,7 +505,7 @@ int free_memtype(u64 start, u64 end)
return 0;
/* Low ISA region is always mapped WB. No need to track */
- if (is_ISA_range(start, end - 1))
+ if (x86_platform.is_untracked_pat_range(start, end - 1))
return 0;
is_range_ram = pat_pagerange_is_ram(start, end);
@@ -582,7 +588,7 @@ static unsigned long lookup_memtype(u64
int rettype = _PAGE_CACHE_WB;
struct memtype *entry;
- if (is_ISA_range(paddr, paddr + PAGE_SIZE - 1))
+ if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE - 1))
return rettype;
if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) {

View file

@ -0,0 +1,33 @@
---
Makefile | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff -p -up linux-2.6.32/Makefile.orig linux-2.6.32/Makefile
--- linux-2.6.32/Makefile.orig 2009-12-10 18:24:46.341174801 -0200
+++ linux-2.6.32/Makefile 2009-12-10 18:25:22.919299464 -0200
@@ -1192,13 +1192,9 @@ CLEAN_DIRS += $(MODVERDIR)
CLEAN_FILES += vmlinux System.map \
.tmp_kallsyms* .tmp_version .tmp_vmlinux* .tmp_System.map
-# Directories & files removed with 'make mrproper'
-MRPROPER_DIRS += include/config include2 usr/include include/generated
-MRPROPER_FILES += .config .config.old include/asm .version .old_version \
- include/linux/autoconf.h include/linux/version.h \
- include/linux/utsrelease.h \
- include/linux/bounds.h include/asm*/asm-offsets.h \
- Module.symvers Module.markers tags TAGS cscope*
+# This is a -devel rpm, so we dont let mrproper remove anything /tmb 12.10.2007
+MRPROPER_DIRS += ""
+MRPROPER_FILES += ""
# clean - Delete most, but leave enough to build external modules
#
@@ -1224,7 +1220,7 @@ clean: archclean $(clean-dirs)
#
mrproper: rm-dirs := $(wildcard $(MRPROPER_DIRS))
mrproper: rm-files := $(wildcard $(MRPROPER_FILES))
-mrproper-dirs := $(addprefix _mrproper_,Documentation/DocBook scripts)
+mrproper-dirs := $(addprefix _mrproper_,Documentation/DocBook)
PHONY += $(mrproper-dirs) mrproper archmrproper
$(mrproper-dirs):

View file

@ -0,0 +1,161 @@
From: Lin Ming <ming.m.lin@intel.com>
Subject: timekeeping: Fix clock_gettime vsyscall time warp
Patch-mainline: 0696b711e4be45fa104c12329f617beb29c03f78
References: bnc#569238
commit 0696b711e4be45fa104c12329f617beb29c03f78
Author: Lin Ming <ming.m.lin@intel.com>
Date: Tue Nov 17 13:49:50 2009 +0800
timekeeping: Fix clock_gettime vsyscall time warp
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
to struct timekeeper" the clock multiplier of vsyscall is updated with
the unmodified clock multiplier of the clock source and not with the
NTP adjusted multiplier of the timekeeper.
This causes user space observerable time warps:
new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf
Add a new argument "mult" to update_vsyscall() and hand in the
timekeeping internal NTP adjusted multiplier.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kurt Garloff <garloff@suse.de>
---
arch/ia64/kernel/time.c | 4 ++--
arch/powerpc/kernel/time.c | 5 +++--
arch/s390/kernel/time.c | 3 ++-
arch/x86/kernel/vsyscall_64.c | 5 +++--
include/linux/clocksource.h | 6 ++++--
kernel/time/timekeeping.c | 6 +++---
6 files changed, 17 insertions(+), 12 deletions(-)
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -473,7 +473,7 @@ void update_vsyscall_tz(void)
{
}
-void update_vsyscall(struct timespec *wall, struct clocksource *c)
+void update_vsyscall(struct timespec *wall, struct clocksource *c, u32 mult)
{
unsigned long flags;
@@ -481,7 +481,7 @@ void update_vsyscall(struct timespec *wa
/* copy fsyscall clock data */
fsyscall_gtod_data.clk_mask = c->mask;
- fsyscall_gtod_data.clk_mult = c->mult;
+ fsyscall_gtod_data.clk_mult = mult;
fsyscall_gtod_data.clk_shift = c->shift;
fsyscall_gtod_data.clk_fsys_mmio = c->fsys_mmio;
fsyscall_gtod_data.clk_cycle_last = c->cycle_last;
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -828,7 +828,8 @@ static cycle_t timebase_read(struct cloc
return (cycle_t)get_tb();
}
-void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
+void update_vsyscall(struct timespec *wall_time, struct clocksource *clock,
+ u32 mult)
{
u64 t2x, stamp_xsec;
@@ -841,7 +842,7 @@ void update_vsyscall(struct timespec *wa
/* XXX this assumes clock->shift == 22 */
/* 4611686018 ~= 2^(20+64-22) / 1e9 */
- t2x = (u64) clock->mult * 4611686018ULL;
+ t2x = (u64) mult * 4611686018ULL;
stamp_xsec = (u64) xtime.tv_nsec * XSEC_PER_SEC;
do_div(stamp_xsec, 1000000000);
stamp_xsec += (u64) xtime.tv_sec * XSEC_PER_SEC;
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -214,7 +214,8 @@ struct clocksource * __init clocksource_
return &clocksource_tod;
}
-void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
+void update_vsyscall(struct timespec *wall_time, struct clocksource *clock,
+ u32 mult)
{
if (clock != &clocksource_tod)
return;
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -73,7 +73,8 @@ void update_vsyscall_tz(void)
write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}
-void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
+void update_vsyscall(struct timespec *wall_time, struct clocksource *clock,
+ u32 mult)
{
unsigned long flags;
@@ -82,7 +83,7 @@ void update_vsyscall(struct timespec *wa
vsyscall_gtod_data.clock.vread = clock->vread;
vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
vsyscall_gtod_data.clock.mask = clock->mask;
- vsyscall_gtod_data.clock.mult = clock->mult;
+ vsyscall_gtod_data.clock.mult = mult;
vsyscall_gtod_data.clock.shift = clock->shift;
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -282,10 +282,12 @@ extern struct clocksource * __init __wea
extern void clocksource_mark_unstable(struct clocksource *cs);
#ifdef CONFIG_GENERIC_TIME_VSYSCALL
-extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
+extern void
+update_vsyscall(struct timespec *ts, struct clocksource *c, u32 mult);
extern void update_vsyscall_tz(void);
#else
-static inline void update_vsyscall(struct timespec *ts, struct clocksource *c)
+static inline void
+update_vsyscall(struct timespec *ts, struct clocksource *c, u32 mult)
{
}
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -177,7 +177,7 @@ void timekeeping_leap_insert(int leapsec
{
xtime.tv_sec += leapsecond;
wall_to_monotonic.tv_sec -= leapsecond;
- update_vsyscall(&xtime, timekeeper.clock);
+ update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
}
#ifdef CONFIG_GENERIC_TIME
@@ -337,7 +337,7 @@ int do_settimeofday(struct timespec *tv)
timekeeper.ntp_error = 0;
ntp_clear();
- update_vsyscall(&xtime, timekeeper.clock);
+ update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
write_sequnlock_irqrestore(&xtime_lock, flags);
@@ -822,7 +822,7 @@ void update_wall_time(void)
update_xtime_cache(nsecs);
/* check to see if there is a new clocksource to use */
- update_vsyscall(&xtime, timekeeper.clock);
+ update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
}
/**

4826
i386_defconfig-server Normal file

File diff suppressed because it is too large Load diff

35
ipv6-no-autoconf Normal file
View file

@ -0,0 +1,35 @@
From: Olaf Kirch <okir@suse.de>
Subject: Allow to bring up network interface w/o ipv6 autoconf
References: 161888
Patch-mainline: no
When bringing up a xen bridge device, it will always be configured to
use a MAC address of ff:ff:ff:ff:ff:fe. This greatly confuses IPv6 DAD,
which starts logging lots and lots of useless messages to syslog.
We really want to disable IPv6 on these interfaces, and there doesn't
seem to be a reliable way to do this without bringing the interface
up first (and triggering IPv6 autoconf).
This patch makes autoconf (DAD and router discovery) depend on the
interface's ability to do multicast. Turning off multicast for an
interface before bringing it up will suppress autoconfiguration.
--- sle11sp1-2010-03-22.orig/net/ipv6/addrconf.c 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/net/ipv6/addrconf.c 2010-03-22 12:10:12.000000000 +0100
@@ -2799,6 +2799,7 @@ static void addrconf_dad_start(struct in
spin_lock_bh(&ifp->lock);
if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) ||
+ !(dev->flags&IFF_MULTICAST) ||
idev->cnf.accept_dad < 1 ||
!(ifp->flags&IFA_F_TENTATIVE) ||
ifp->flags & IFA_F_NODAD) {
@@ -2891,6 +2892,7 @@ static void addrconf_dad_completed(struc
if (ifp->idev->cnf.forwarding == 0 &&
ifp->idev->cnf.rtr_solicits > 0 &&
(dev->flags&IFF_LOOPBACK) == 0 &&
+ (dev->flags & IFF_MULTICAST) &&
(ipv6_addr_type(&ifp->addr) & IPV6_ADDR_LINKLOCAL)) {
/*
* If a host as already performed a random delay

View file

@ -0,0 +1,62 @@
Index: linux-2.6.30/Kbuild
===================================================================
--- linux-2.6.30.orig/Kbuild
+++ linux-2.6.30/Kbuild
@@ -10,8 +10,8 @@
bounds-file := include/linux/bounds.h
-always := $(bounds-file)
-targets := $(bounds-file) kernel/bounds.s
+always_noclean := $(bounds-file)
+targets := kernel/bounds.s
quiet_cmd_bounds = GEN $@
define cmd_bounds
@@ -45,8 +45,7 @@ $(obj)/$(bounds-file): kernel/bounds.s K
offsets-file := include/asm/asm-offsets.h
-always += $(offsets-file)
-targets += $(offsets-file)
+always_noclean += $(offsets-file)
targets += arch/$(SRCARCH)/kernel/asm-offsets.s
@@ -93,6 +92,3 @@ quiet_cmd_syscalls = CALL $<
PHONY += missing-syscalls
missing-syscalls: scripts/checksyscalls.sh FORCE
$(call cmd,syscalls)
-
-# Delete all targets during make clean
-clean-files := $(addprefix $(objtree)/,$(filter-out $(bounds-file) $(offsets-file),$(targets)))
Index: linux-2.6.30/scripts/Makefile.build
===================================================================
--- linux-2.6.30.orig/scripts/Makefile.build
+++ linux-2.6.30/scripts/Makefile.build
@@ -15,6 +15,7 @@ obj-m :=
lib-y :=
lib-m :=
always :=
+always_noclean :=
targets :=
subdir-y :=
subdir-m :=
@@ -92,7 +93,7 @@ modorder-target := $(obj)/modules.order
__build: $(if $(KBUILD_BUILTIN),$(builtin-target) $(lib-target) $(extra-y)) \
$(if $(KBUILD_MODULES),$(obj-m) $(modorder-target)) \
- $(subdir-ym) $(always)
+ $(subdir-ym) $(always) $(always_noclean)
@:
# Linus' kernel sanity checking tool
@@ -264,7 +265,7 @@ $(obj)/%.o: $(src)/%.S FORCE
$(call if_changed_dep,as_o_S)
targets += $(real-objs-y) $(real-objs-m) $(lib-y)
-targets += $(extra-y) $(MAKECMDGOALS) $(always)
+targets += $(extra-y) $(MAKECMDGOALS) $(always) $(always_noclean)
# Linker scripts preprocessor (.lds.S -> .lds)
# ---------------------------------------------------------------------------

504
kernel-xen.spec Normal file
View file

@ -0,0 +1,504 @@
%define name kernel-xen
%define version 2.6.32.11
%define rel 2
%define kernel_version 2.6.32.11
%define kernel_extraversion xen-%{rel}mdv
# ensures file uniqueness
%define kernel_file_string %{kernel_version}-%{kernel_extraversion}
# ensures package uniqueness
%define kernel_package_string %{kernel_version}-%{rel}mdv
%define kernel_source_dir %{_prefix}/src/%{name}-%{kernel_package_string}
%define kernel_devel_dir %{_prefix}/src/%{name}-devel-%{kernel_package_string}
%define _default_patch_fuzz 3
%ifarch %ix86
%define config %{SOURCE1}
%endif
%ifarch x86_64
%define config %{SOURCE2}
%endif
Name: %{name}
Version: %{version}
Release: %mkrel %{rel}
Summary: The Xen kernel
Group: System/Kernel and hardware
License: GPL
Source0: linux-%{kernel_version}.tar.bz2
Source1: i386_defconfig-server
Source2: x86_64_defconfig-server
Source12: disable-mrproper-in-devel-rpms.patch
Source13: kbuild-really-dont-remove-bounds-asm-offsets-headers.patch
# suze patches
Patch90: bug-561933_uv_pat_is_gru_range.patch
Patch91: x86-Unify-fixup_irqs-for-32-bit-and-64-bit-kernels.patch
Patch92: SoN-23-mm-swapfile.patch
Patch93: x86-cpu-mv-display_cacheinfo-cpu_detect_cache_sizes.patch
Patch94: fix_clock_gettime_vsyscall_time_warp.diff
### both uml framebuffer and xen need this one.
Patch100: add-console-use-vt
# split out patches
Patch101: linux-2.6.19-rc1-kexec-move_segment_code-i386.patch
Patch102: linux-2.6.19-rc1-kexec-move_segment_code-x86_64.patch
Patch103: ipv6-no-autoconf
Patch104: pci-guestdev
Patch105: pci-reserve
Patch106: sfc-driverlink
Patch107: sfc-resource-driver
Patch108: sfc-driverlink-conditional
Patch109: sfc-external-sram
Patch110: tmem
# bulk stuff, new files for xen
Patch200: xen3-auto-xen-arch.diff
Patch201: xen3-auto-xen-drivers.diff
Patch202: xen3-auto-include-xen-interface.diff
# kconfig bits for xen
Patch300: xen3-auto-xen-kconfig.diff
# common code changes
Patch400: xen3-auto-common.diff
Patch401: xen3-auto-arch-x86.diff
Patch402: xen3-auto-arch-i386.diff
Patch403: xen3-auto-arch-x86_64.diff
# fixups due to upstream Xen parts
Patch500: xen3-fixup-xen
Patch501: sfc-set-arch
Patch502: sfc-endianness
# newer changeset backports
# changes outside arch/{i386,x86_64}/xen
Patch700: xen3-fixup-kconfig
Patch701: xen3-fixup-common
Patch702: xen3-fixup-arch-x86
# ports of other patches
Patch800: xen3-patch-2.6.18
Patch801: xen3-patch-2.6.19
Patch802: xen3-patch-2.6.20
Patch803: xen3-patch-2.6.21
Patch804: xen3-patch-2.6.22
Patch805: xen3-patch-2.6.23
Patch806: xen3-patch-2.6.24
Patch807: xen3-patch-2.6.25
Patch808: xen3-patch-2.6.26
Patch809: xen3-patch-2.6.27
Patch810: xen3-patch-2.6.28
Patch811: xen3-patch-2.6.29
Patch812: xen3-patch-2.6.30
Patch813: xen3-patch-2.6.31
Patch814: xen3-patch-2.6.32
Patch815: xen3-patch-2.6.32.1-2
Patch816: xen3-patch-2.6.32.2-3
Patch817: xen3-patch-2.6.32.3-4
Patch818: xen3-patch-2.6.32.7-8
Patch819: xen3-patch-2.6.32.8-9
Patch820: xen3-patch-2.6.32.9-10
Patch821: xen3-seccomp-disable-tsc-option
Patch822: xen3-fix_clock_gettime_vsyscall_time_warp.diff
Patch823: xen3-x86-mcp51-no-dac
#Patch824: xen3-x86-64-preserve-large-page-mapping-for-1st-2mb-kernel-txt-with-config_debug_rodata
#Patch825: xen3-x86-64-align-rodata-kernel-section-to-2mb-with-config_debug_rodata
#Patch826: xen3-x86-mark_rodata_rw.patch
#Patch827: xen3-x86-ftrace-fix-rodata-1.patch
#Patch828: xen3-x86-ftrace-fix-rodata-3.patch
Patch829: xen3-x86-Remove-CPU-cache-size-output-for-non-Intel-too.patch
Patch830: xen3-x86-cpu-mv-display_cacheinfo-cpu_detect_cache_sizes.patch
Patch831: xen3-x86-Limit-the-number-of-processor-bootup-messages.patch
Patch832: xen3-x86_64_apic_consider_hotplug_for_mode_logical_flat.patch
Patch833: xen3-x86_ioapic_fix_out_of_order_gsi.patch
Patch834: xen3-x86-Reduce-per-cpu-warning-boot-up-messages.patch
Patch835: xen3-x86-pat-Update-page-flags-for-memtype-without-using-memtype_lock-V4.patch
Patch836: xen3-bug-561933_uv_pat_is_gru_range.patch
Patch837: xen3-x86-Fix-sched_clock_cpu-for-systems-with-unsynchronized-TSC.patch
Patch838: xen3-x86-Unify-fixup_irqs-for-32-bit-and-64-bit-kernels.patch
Patch839: xen3-x86-intr-remap-Avoid-irq_chip-mask-unmask-in-fixup_irqs-for-intr-remapping.patch
Patch840: xen3-x86-Remove-local_irq_enable-local_irq_disable-in-fixup_irqs.patch
#Patch841: xen3-vmw_pvscsi-scsi-driver-for-vmware-s-virtual-hba.patch
#Patch842: xen3-add-support-for-intel-cougar-point-chipset.patch
#Patch843: xen3-kdb-x86
Patch844: xen3-stack-unwind
Patch845: xen3-x86_64-unwind-annotations
# bugfixes and enhancements
Patch900: xen-balloon-max-target
Patch901: xen-modular-blktap
Patch902: xen-blkback-bimodal-suse
Patch903: xen-blkif-protocol-fallback-hack
Patch904: xen-blkback-cdrom
Patch905: xen-blktap-write-barriers
Patch906: xen-op-packet
Patch907: xen-blkfront-cdrom
Patch908: xen-sections
Patch909: xen-swiotlb-heuristics
Patch910: xen-kconfig-compat
Patch911: xen-cpufreq-report
Patch912: xen-staging-build
Patch913: xen-sysdev-suspend
Patch914: xen-ipi-per-cpu-irq
Patch915: xen-virq-per-cpu-irq
Patch916: xen-spinlock-poll-early
Patch917: xen-configurable-guest-devices
Patch918: xen-netback-nr-irqs
Patch919: xen-netback-notify-multi
Patch920: xen-netback-generalize
Patch921: xen-netback-multiple-tasklets
Patch922: xen-netback-kernel-threads
Patch923: xen-netfront-ethtool
Patch924: xen-unpriv-build
Patch925: xen-dcdbas
Patch926: xen-floppy
Patch927: xen-x86-panic-no-reboot
Patch928: xen-x86-dcr-fallback
Patch929: xen-x86-consistent-nmi
Patch930: xen-x86-no-lapic
Patch931: xen-x86-pmd-handling
Patch932: xen-x86-bigmem
Patch933: xen-x86-machphys-prediction
Patch934: xen-x86-exit-mmap
Patch935: xen-x86-per-cpu-vcpu-info
Patch936: xen-x86-xtime-lock
Patch937: xen-x86-time-per-cpu
Patch938: xen-x86_64-pgd-pin
Patch939: xen-x86_64-pgd-alloc-order
Patch940: xen-x86_64-dump-user-pgt
Patch941: xen-x86_64-note-init-p2m
BuildRoot: %{_tmppath}/%{name}-%{version}
%description
The XEN kernel.
%package -n kernel-xen-%{kernel_package_string}
Version: 1
Release: %mkrel 1
Summary: XEN kernel
Group: System/Kernel and hardware
Provides: kernel = %{kernel_version}
Provides: kernel-xen = %{kernel_version}
Requires(post): bootloader-utils mkinitrd xen-hypervisor
Requires(postun): bootloader-utils
%description -n kernel-xen-%{kernel_package_string}
The XEN kernel.
%package devel-%{kernel_package_string}
Version: 1
Release: %mkrel 1
Summary: XEN kernel devel files
Group: System/Kernel and hardware
Provides: kernel-devel = %{kernel_version}
Autoreqprov: no
%description devel-%{kernel_package_string}
This package contains the kernel-devel files that should be enough to build
3rdparty drivers against for use with the %{kname}-%{buildrel}.
%package source-%{kernel_package_string}
Version: 1
Release: %mkrel 1
Summary: XEN kernel sources
Group: System/Kernel and hardware
Provides: kernel-source = %{kernel_version}
Autoreqprov: no
%description source-%{kernel_package_string}
This package contains the source code files for the Linux
kernel. Theese source files are only needed if you want to build your own
custom kernel that is better tuned to your particular hardware.
%package debug-%{kernel_package_string}
Version: 1
Release: %mkrel 1
Summary: Xen kernel debug files
Group: Development/Debug
Requires: glibc-devel
Provides: kernel-debug = %{kernel_version}
Autoreqprov: no
%description debug-%{kernel_package_string}
This package contains the kernel-debug files that should be enough to
use debugging/monitoring tool (like systemtap, oprofile, ...)
%package doc-%{kernel_package_string}
Version: 1
Release: %mkrel 1
Summary: XEN kernel documentation
Group: System/Kernel and hardware
Autoreqprov: no
%description doc-%{kernel_package_string}
This package contains documentation files form the kernel source. Various
bits of information about the Linux kernel and the device drivers shipped
with it are documented in these files. You also might want install this
package if you need a reference to the options that can be passed to Linux
kernel modules at load time.
%prep
%setup -q -n linux-%{kernel_version}
%apply_patches
%build
perl -p \
-e 's/CONFIG_LOCALVERSION=.*/CONFIG_LOCALVERSION="-%{kernel_extraversion}"/' \
< %config > .config
%make oldconfig
%make
%make modules
%install
rm -rf %{buildroot}
install -d -m 755 %{buildroot}/boot
install -m 644 System.map %{buildroot}/boot/System.map-%{kernel_file_string}
install -m 644 .config %{buildroot}/boot/config-%{kernel_file_string}
install -m 644 arch/x86/boot/vmlinuz \
%{buildroot}/boot/vmlinuz-%{kernel_file_string}
# modules
%make modules_install INSTALL_MOD_PATH=%{buildroot}
# remove firmwares
rm -rf %{buildroot}/lib/firmware
# remove symlinks
rm -f %{buildroot}/lib/modules/%{kernel_file_string}/build
rm -f %{buildroot}/lib/modules/%{kernel_file_string}/source
# strip modules, as spec-helper won't recognize them once compressed
find %{buildroot}/lib/modules/%{kernel_file_string}/kernel -name *.ko \
-exec objcopy --only-keep-debug '{}' '{}'.debug \;
find %{buildroot}/lib/modules/%{kernel_file_string}/kernel -name *.ko \
-exec objcopy --add-gnu-debuglink='{}'.debug --strip-debug '{}' \;
find %{buildroot}/lib/modules/%{kernel_file_string}/kernel -name *.ko.debug | \
sed -e 's|%{buildroot}||' > kernel_debug_files.list
# create an exclusion list for those debug files
sed -e 's|^|%exclude |' < kernel_debug_files.list > no_kernel_debug_files.list
# compress modules
find %{buildroot}/lib/modules/%{kernel_file_string} -name *.ko | xargs gzip -9
/sbin/depmod -u -ae -b %{buildroot} -r \
-F %{buildroot}/boot/System.map-%{kernel_file_string} \
%{kernel_file_string}
# create modules description
pushd %{buildroot}/lib/modules/%{kernel_file_string}
find . -name *.ko.gz | xargs /sbin/modinfo | \
perl -lne 'print "$name\t$1" if $name && /^description:\s*(.*)/; $name = $1 if m!^filename:\s*(.*)\.k?o!; $name =~ s!.*/!!' \
> modules.description
popd
# install kernel sources
install -d -m 755 %{buildroot}%{kernel_source_dir}
tar cf - . \
--exclude '*.o' --exclude '*.ko' --exclude '*.cmd' \
--exclude '.temp*' --exclude '.tmp*' --exclude '*.0[0-9][0-9][0-9]' \
--exclude modules.order --exclude .gitignore \
| tar xf - -C %{buildroot}%{kernel_source_dir}
chmod -R a+rX %{buildroot}%{kernel_source_dir}
# we remove all the source files that we don't ship
# first architecture files
for i in alpha arm arm26 avr32 blackfin cris frv h8300 ia64 microblaze mips \
m32r m68k m68knommu mn10300 parisc powerpc ppc s390 sh sh64 sparc v850 xtensa; do
rm -rf %{buildroot}%{kernel_source_dir}/arch/$i
rm -rf %{buildroot}%{kernel_source_dir}/include/asm-$i
done
%ifnarch %{ix86} x86_64
rm -rf %{buildroot}%{kernel_source_dir}/arch/x86
rm -rf %{buildroot}%{kernel_source_dir}/include/asm-x86
%endif
rm -rf %{buildroot}%{kernel_source_dir}/vmlinux
rm -rf %{buildroot}%{kernel_source_dir}/System.map
rm -rf %{buildroot}%{kernel_source_dir}/Module.*
rm -rf %{buildroot}%{kernel_source_dir}/*.list
rm -rf %{buildroot}%{kernel_source_dir}/.config.*
rm -rf %{buildroot}%{kernel_source_dir}/.missing-syscalls.d
rm -rf %{buildroot}%{kernel_source_dir}/.version
rm -rf %{buildroot}%{kernel_source_dir}/.mailmap
# install devel files
install -d -m 755 %{buildroot}%{kernel_devel_dir}
for i in $(find . -name 'Makefile*'); do
cp -R --parents $i %{buildroot}%{kernel_devel_dir};
done
for i in $(find . -name 'Kconfig*' -o -name 'Kbuild*'); do
cp -R --parents $i %{buildroot}%{kernel_devel_dir};
done
cp -fR include %{buildroot}%{kernel_devel_dir}
cp -fR scripts %{buildroot}%{kernel_devel_dir}
%ifarch %{ix86} x86_64
cp -fR arch/x86/kernel/asm-offsets.{c,s} \
%{buildroot}%{kernel_devel_dir}/arch/x86/kernel/
cp -fR arch/x86/kernel/asm-offsets_{32,64}.c \
%{buildroot}%{kernel_devel_dir}/arch/x86/kernel/
cp -fR arch/x86/include %{buildroot}%{kernel_devel_dir}/arch/x86/
%else
cp -fR arch/%{target_arch}/kernel/asm-offsets.{c,s} \
%{buildroot}%{kernel_devel_dir}/arch/%{target_arch}/kernel/
cp -fR arch/%{target_arch}/include \
%{buildroot}%{kernel_devel_dir}/arch/%{target_arch}/
%endif
cp -fR .config Module.symvers %{buildroot}%{kernel_devel_dir}
# Needed for truecrypt build (Danny)
cp -fR drivers/md/dm.h %{buildroot}%{kernel_devel_dir}/drivers/md/
# Needed for external dvb tree (#41418)
cp -fR drivers/media/dvb/dvb-core/*.h \
%{buildroot}%{kernel_devel_dir}/drivers/media/dvb/dvb-core/
cp -fR drivers/media/dvb/frontends/lgdt330x.h \
%{buildroot}%{kernel_devel_dir}/drivers/media/dvb/frontends/
# add acpica header files, needed for fglrx build
cp -fR drivers/acpi/acpica/*.h \
%{buildroot}%{kernel_devel_dir}/drivers/acpi/acpica/
# disable mrproper
patch -p1 -d %{buildroot}%{kernel_devel_dir} -i %{SOURCE12}
# disable bounds.h and asm-offsets.h removal
patch -p1 -d %{buildroot}%{kernel_devel_dir} -i %{SOURCE13}
%post %{kernel_package_string}
/sbin/installkernel %{kernel_file_string}
pushd /boot > /dev/null
if [ -L vmlinuz-xen ]; then
rm -f vmlinuz-xen
fi
ln -sf vmlinuz-%{kernel_file_string} vmlinuz-xen
if [ -L initrd-xen.img ]; then
rm -f initrd-xen.img
fi
ln -sf initrd-%{kernel_file_string}.img initrd-xen.img
popd > /dev/null
%postun %{kernel_package_string}
/sbin/installkernel -R %{kernel_file_string}
pushd /boot > /dev/null
if [ -L vmlinuz-xen ]; then
if [ "$(readlink vmlinuz-xen)" = "vmlinuz-%{kernel_file_string}" ]; then
rm -f vmlinuz-xen
fi
fi
if [ -L initrd-xen.img ]; then
if [ "$(readlink initrd-xen.img)" = "initrd-%{kernel_file_string}.img" ]; then
rm -f initrd-xen.img
fi
fi
popd > /dev/null
%post devel-%{kernel_package_string}
if [ -d /lib/modules/%{kernel_file_string} ]; then
ln -sf %{kernel_devel_dir} /lib/modules/%{kernel_file_string}/build
ln -sf %{kernel_devel_dir} /lib/modules/%{kernel_file_string}/source
fi
%preun devel-%{kernel_package_string}
if [ -L /lib/modules/%{kernel_file_string}/build ]; then
rm -f /lib/modules/%{kernel_devel_string}/build
fi
if [ -L /lib/modules/%{kernel_file_string}/source ]; then
rm -f /lib/modules/%{kernel_devel_string}/source
fi
%post source-%{kernel_package_string}
if [ -d /lib/modules/%{kernel_file_string} ]; then
ln -sf %{kernel_source_dir} /lib/modules/%{kernel_file_string}/build
ln -sf %{kernel_source_dir} /lib/modules/%{kernel_file_string}/source
fi
%preun source-%{kernel_package_string}
if [ -L /lib/modules/%{kernel_file_string}/build ]; then
rm -f /lib/modules/%{kernel_source_string}/build
fi
if [ -L /lib/modules/%{kernel_file_string}/source ]; then
rm -f /lib/modules/%{kernel_source_string}/source
fi
%clean
rm -rf %{buildroot}
%files -n kernel-xen-%{kernel_package_string} -f no_kernel_debug_files.list
%defattr(-,root,root)
/lib/modules/%{kernel_file_string}
/boot/System.map-%{kernel_file_string}
/boot/config-%{kernel_file_string}
/boot/vmlinuz-%{kernel_file_string}
%files -n kernel-xen-devel-%{kernel_package_string}
%defattr(-,root,root)
%{kernel_devel_dir}
%files -n kernel-xen-source-%{kernel_package_string}
%defattr(-,root,root)
%{kernel_source_dir}
%exclude %{kernel_source_dir}/Documentation
%files -n kernel-xen-doc-%{kernel_package_string}
%defattr(-,root,root)
%{kernel_source_dir}/Documentation
%files -n kernel-xen-debug-%{kernel_package_string} -f kernel_debug_files.list
%defattr(-,root,root)
%changelog
* Mon Apr 05 2010 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.32.11-2mdv2010.1
+ Revision: 531739
- exclude patch backup files from sources
* Sun Apr 04 2010 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.32.11-1mdv2010.1
+ Revision: 531429
- switch to Suze SLE11-SP1 branch, for easier maintainance
- revert to 2.6.32
- new version
* Mon Mar 15 2010 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.33-1mdv2010.1
+ Revision: 519112
- new version
- switch to 2.6.31.12
- sync configuration with default kernel-server
- new kernel version
- new patchset
- set fuziness level to 2, it's too painful to rediff patches
* Sat Nov 07 2009 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.30.2-7mdv2010.1
+ Revision: 462699
- standard -devel, -source and -doc packages
- ensure kernel-devel contains actual sources of this kernel, not just vanilla
sources
- kernel-devel doesn't require kernel itself
* Wed Oct 14 2009 Pascal Terjan <pterjan@mandriva.org> 2.6.30.2-6mdv2010.0
+ Revision: 457357
- We need xen-hypervisor, not xen
- Create unversioned links
- Require(post) xen, else bootloader config gets wrong
- Removes bootloader entries on removal
- Create unversioned links in /boot
- Require bootloader-utils and mkinitrd for post
* Thu Oct 08 2009 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.30.2-5mdv2010.0
+ Revision: 456092
- don't ship kernel modules debug files in main kernel (spotted by buchan)
* Thu Oct 01 2009 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.30.2-4mdv2010.0
+ Revision: 452315
- fix build with gcc 4.3
* Thu Oct 01 2009 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.30.2-3mdv2010.0
+ Revision: 452224
- install files manually, 'make install' is too much fragile
* Sat Sep 26 2009 Guillaume Rousse <guillomovitch@mandriva.org> 2.6.30.2-2mdv2010.0
+ Revision: 449505
- don't use parallel make invocation for installation
- post-installation initrd and bootloader handling
- drop %%apply_patch macro, it's not backportable
* Tue Sep 01 2009 Pascal Terjan <pterjan@mandriva.org> 2.6.30.2-1mdv2010.0
+ Revision: 423653
- version files in /boot
+ Guillaume Rousse <guillomovitch@mandriva.org>
- import kernel-xen

View file

@ -0,0 +1,155 @@
Subject: kexec: Move asm segment handling code to the assembly file (i386)
From: http://xenbits.xensource.com/xen-unstable.hg (tip 13816)
Patch-mainline: obsolete
This patch moves the idt, gdt, and segment handling code from machine_kexec.c
to relocate_kernel.S. The main reason behind this move is to avoid code
duplication in the Xen hypervisor. With this patch all code required to kexec
is put on the control page.
On top of that this patch also counts as a cleanup - I think it is much
nicer to write assembly directly in assembly files than wrap inline assembly
in C functions for no apparent reason.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Acked-by: jbeulich@novell.com
Applies to 2.6.19-rc1.
jb: fixed up register usage (paralleling what's needed for 2.6.30 on x86-64)
--- head-2009-10-06.orig/arch/x86/kernel/machine_kexec_32.c 2009-10-12 10:33:15.000000000 +0200
+++ head-2009-10-06/arch/x86/kernel/machine_kexec_32.c 2009-04-21 10:33:15.000000000 +0200
@@ -26,48 +26,6 @@
#include <asm/system.h>
#include <asm/cacheflush.h>
-static void set_idt(void *newidt, __u16 limit)
-{
- struct desc_ptr curidt;
-
- /* ia32 supports unaliged loads & stores */
- curidt.size = limit;
- curidt.address = (unsigned long)newidt;
-
- load_idt(&curidt);
-}
-
-
-static void set_gdt(void *newgdt, __u16 limit)
-{
- struct desc_ptr curgdt;
-
- /* ia32 supports unaligned loads & stores */
- curgdt.size = limit;
- curgdt.address = (unsigned long)newgdt;
-
- load_gdt(&curgdt);
-}
-
-static void load_segments(void)
-{
-#define __STR(X) #X
-#define STR(X) __STR(X)
-
- __asm__ __volatile__ (
- "\tljmp $"STR(__KERNEL_CS)",$1f\n"
- "\t1:\n"
- "\tmovl $"STR(__KERNEL_DS)",%%eax\n"
- "\tmovl %%eax,%%ds\n"
- "\tmovl %%eax,%%es\n"
- "\tmovl %%eax,%%fs\n"
- "\tmovl %%eax,%%gs\n"
- "\tmovl %%eax,%%ss\n"
- : : : "eax", "memory");
-#undef STR
-#undef __STR
-}
-
static void machine_kexec_free_page_tables(struct kimage *image)
{
free_page((unsigned long)image->arch.pgd);
@@ -228,24 +186,6 @@ void machine_kexec(struct kimage *image)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
<< PAGE_SHIFT);
- /*
- * The segment registers are funny things, they have both a
- * visible and an invisible part. Whenever the visible part is
- * set to a specific selector, the invisible part is loaded
- * with from a table in memory. At no other time is the
- * descriptor table in memory accessed.
- *
- * I take advantage of this here by force loading the
- * segments, before I zap the gdt with an invalid value.
- */
- load_segments();
- /*
- * The gdt & idt are now invalid.
- * If you want to load them you must set up your own idt & gdt.
- */
- set_gdt(phys_to_virt(0), 0);
- set_idt(phys_to_virt(0), 0);
-
/* now call it */
image->start = relocate_kernel_ptr((unsigned long)image->head,
(unsigned long)page_list,
--- head-2009-10-06.orig/arch/x86/kernel/relocate_kernel_32.S 2009-10-12 10:33:15.000000000 +0200
+++ head-2009-10-06/arch/x86/kernel/relocate_kernel_32.S 2009-10-12 10:39:36.000000000 +0200
@@ -87,14 +87,32 @@ relocate_kernel:
movl PTR(PA_PGD)(%ebp), %eax
movl %eax, %cr3
+ /* setup idt */
+ lidtl idt_48 - relocate_kernel(%edi)
+
+ /* setup gdt */
+ leal gdt - relocate_kernel(%edi), %eax
+ movl %eax, (gdt_48 - relocate_kernel) + 2(%edi)
+ lgdtl gdt_48 - relocate_kernel(%edi)
+
+ /* setup data segment registers */
+ mov $(gdt_ds - gdt), %eax
+ mov %eax, %ds
+ mov %eax, %es
+ mov %eax, %fs
+ mov %eax, %gs
+ mov %eax, %ss
+
/* setup a new stack at the end of the physical control page */
lea PAGE_SIZE(%edi), %esp
- /* jump to identity mapped page */
+ /* load new code segment and jump to identity mapped page */
+ pushl $0
+ pushl $(gdt_cs - gdt)
movl %edi, %eax
addl $(identity_mapped - relocate_kernel), %eax
pushl %eax
- ret
+ iretl
identity_mapped:
/* store the start address on the stack */
@@ -271,5 +289,22 @@ swap_pages:
popl %ebp
ret
+ .align 16
+gdt:
+ .quad 0x0000000000000000 /* NULL descriptor */
+gdt_cs:
+ .quad 0x00cf9a000000ffff /* kernel 4GB code at 0x00000000 */
+gdt_ds:
+ .quad 0x00cf92000000ffff /* kernel 4GB data at 0x00000000 */
+gdt_end:
+
+gdt_48:
+ .word gdt_end - gdt - 1 /* limit */
+ .long 0 /* base - filled in by code above */
+
+idt_48:
+ .word 0 /* limit */
+ .long 0 /* base */
+
.globl kexec_control_code_size
.set kexec_control_code_size, . - relocate_kernel

View file

@ -0,0 +1,150 @@
Subject: kexec: Move asm segment handling code to the assembly file (x86_64)
From: http://xenbits.xensource.com/xen-unstable.hg (tip 13816)
Patch-mainline: obsolete
This patch moves the idt, gdt, and segment handling code from machine_kexec.c
to relocate_kernel.S. The main reason behind this move is to avoid code
duplication in the Xen hypervisor. With this patch all code required to kexec
is put on the control page.
On top of that this patch also counts as a cleanup - I think it is much
nicer to write assembly directly in assembly files than wrap inline assembly
in C functions for no apparent reason.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Acked-by: jbeulich@novell.com
Applies to 2.6.19-rc1.
jb: fixed up register usage for 2.6.30 (bnc#545206)
--- head-2009-10-06.orig/arch/x86/kernel/machine_kexec_64.c 2009-10-12 10:17:22.000000000 +0200
+++ head-2009-10-06/arch/x86/kernel/machine_kexec_64.c 2009-04-21 10:35:13.000000000 +0200
@@ -201,47 +201,6 @@ static int init_pgtable(struct kimage *i
return init_transition_pgtable(image, level4p);
}
-static void set_idt(void *newidt, u16 limit)
-{
- struct desc_ptr curidt;
-
- /* x86-64 supports unaliged loads & stores */
- curidt.size = limit;
- curidt.address = (unsigned long)newidt;
-
- __asm__ __volatile__ (
- "lidtq %0\n"
- : : "m" (curidt)
- );
-};
-
-
-static void set_gdt(void *newgdt, u16 limit)
-{
- struct desc_ptr curgdt;
-
- /* x86-64 supports unaligned loads & stores */
- curgdt.size = limit;
- curgdt.address = (unsigned long)newgdt;
-
- __asm__ __volatile__ (
- "lgdtq %0\n"
- : : "m" (curgdt)
- );
-};
-
-static void load_segments(void)
-{
- __asm__ __volatile__ (
- "\tmovl %0,%%ds\n"
- "\tmovl %0,%%es\n"
- "\tmovl %0,%%ss\n"
- "\tmovl %0,%%fs\n"
- "\tmovl %0,%%gs\n"
- : : "a" (__KERNEL_DS) : "memory"
- );
-}
-
int machine_kexec_prepare(struct kimage *image)
{
unsigned long start_pgtable;
@@ -308,24 +267,6 @@ void machine_kexec(struct kimage *image)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
<< PAGE_SHIFT);
- /*
- * The segment registers are funny things, they have both a
- * visible and an invisible part. Whenever the visible part is
- * set to a specific selector, the invisible part is loaded
- * with from a table in memory. At no other time is the
- * descriptor table in memory accessed.
- *
- * I take advantage of this here by force loading the
- * segments, before I zap the gdt with an invalid value.
- */
- load_segments();
- /*
- * The gdt & idt are now invalid.
- * If you want to load them you must set up your own idt & gdt.
- */
- set_gdt(phys_to_virt(0), 0);
- set_idt(phys_to_virt(0), 0);
-
/* now call it */
image->start = relocate_kernel((unsigned long)image->head,
(unsigned long)page_list,
--- head-2009-10-06.orig/arch/x86/kernel/relocate_kernel_64.S 2009-10-12 10:17:22.000000000 +0200
+++ head-2009-10-06/arch/x86/kernel/relocate_kernel_64.S 2009-10-12 10:32:00.000000000 +0200
@@ -91,13 +91,30 @@ relocate_kernel:
/* Switch to the identity mapped page tables */
movq %r9, %cr3
+ /* setup idt */
+ lidtq idt_80 - relocate_kernel(%r8)
+
+ /* setup gdt */
+ leaq gdt - relocate_kernel(%r8), %rax
+ movq %rax, (gdt_80 - relocate_kernel) + 2(%r8)
+ lgdtq gdt_80 - relocate_kernel(%r8)
+
+ /* setup data segment registers */
+ xorl %eax, %eax
+ movl %eax, %ds
+ movl %eax, %es
+ movl %eax, %fs
+ movl %eax, %gs
+ movl %eax, %ss
+
/* setup a new stack at the end of the physical control page */
lea PAGE_SIZE(%r8), %rsp
- /* jump to identity mapped page */
+ /* load new code segment and jump to identity mapped page */
addq $(identity_mapped - relocate_kernel), %r8
+ pushq $(gdt_cs - gdt)
pushq %r8
- ret
+ lretq
identity_mapped:
/* store the start address on the stack */
@@ -262,5 +279,20 @@ swap_pages:
3:
ret
+ .align 16
+gdt:
+ .quad 0x0000000000000000 /* NULL descriptor */
+gdt_cs:
+ .quad 0x00af9a000000ffff
+gdt_end:
+
+gdt_80:
+ .word gdt_end - gdt - 1 /* limit */
+ .quad 0 /* base - filled in by code above */
+
+idt_80:
+ .word 0 /* limit */
+ .quad 0 /* base */
+
.globl kexec_control_code_size
.set kexec_control_code_size, . - relocate_kernel

2645
pci-guestdev Normal file

File diff suppressed because it is too large Load diff

236
pci-reserve Normal file
View file

@ -0,0 +1,236 @@
Subject: linux/pci: reserve io/memory space for bridge
From: http://xenbits.xensource.com/linux-2.6.18-xen.hg (tip 1010:10eae161c153)
Patch-mainline: n/a
reserve io/memory space for bridge which will be used later
by PCI hotplug.
Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Acked-by: jbeulich@novell.com
--- sle11sp1-2010-03-11.orig/Documentation/kernel-parameters.txt 2010-03-11 09:11:45.000000000 +0100
+++ sle11sp1-2010-03-11/Documentation/kernel-parameters.txt 2010-03-11 09:11:54.000000000 +0100
@@ -1994,6 +1994,13 @@ and is between 256 and 4096 characters.
off: Turn ECRC off
on: Turn ECRC on.
+ pci_reserve= [PCI]
+ Format: [<sbdf>[+IO<size>][+MEM<size>]][,<sbdf>...]
+ Format of sbdf: [<segment>:]<bus>:<dev>.<func>
+ Specifies the least reserved io size or memory size
+ which is assigned to PCI bridge even when no child
+ pci device exists. This is useful with PCI hotplug.
+
pcie_aspm= [PCIE] Forcibly enable or disable PCIe Active State Power
Management.
off Disable ASPM.
--- sle11sp1-2010-03-11.orig/drivers/pci/Kconfig 2009-12-04 10:27:46.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/pci/Kconfig 2009-12-04 10:27:49.000000000 +0100
@@ -56,6 +56,13 @@ config PCI_IOMULTI
help
Say Y here if you need io multiplexing.
+config PCI_RESERVE
+ bool "PCI IO/MEMORY space reserve"
+ depends on PCI && XEN_PRIVILEGED_GUEST
+ default y
+ help
+ Say Y here if you need PCI IO/MEMORY space reserve
+
config PCI_STUB
tristate "PCI Stub driver"
depends on PCI
--- sle11sp1-2010-03-11.orig/drivers/pci/Makefile 2009-12-04 10:27:46.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/pci/Makefile 2009-12-04 10:27:49.000000000 +0100
@@ -9,6 +9,7 @@ obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_SYSFS) += slot.o
obj-$(CONFIG_PCI_GUESTDEV) += guestdev.o
obj-$(CONFIG_PCI_IOMULTI) += iomulti.o
+obj-$(CONFIG_PCI_RESERVE) += reserve.o
obj-$(CONFIG_PCI_LEGACY) += legacy.o
CFLAGS_legacy.o += -Wno-deprecated-declarations
--- sle11sp1-2010-03-11.orig/drivers/pci/pci.h 2009-12-04 10:27:46.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/pci/pci.h 2009-12-04 10:27:49.000000000 +0100
@@ -318,4 +318,19 @@ extern int pci_is_iomuldev(struct pci_de
#define pci_is_iomuldev(dev) 0
#endif
+#ifdef CONFIG_PCI_RESERVE
+unsigned long pci_reserve_size_io(struct pci_bus *bus);
+unsigned long pci_reserve_size_mem(struct pci_bus *bus);
+#else
+static inline unsigned long pci_reserve_size_io(struct pci_bus *bus)
+{
+ return 0;
+}
+
+static inline unsigned long pci_reserve_size_mem(struct pci_bus *bus)
+{
+ return 0;
+}
+#endif /* CONFIG_PCI_RESERVE */
+
#endif /* DRIVERS_PCI_H */
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ sle11sp1-2010-03-11/drivers/pci/reserve.c 2010-03-24 14:00:05.000000000 +0100
@@ -0,0 +1,138 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Copyright (c) 2009 Isaku Yamahata
+ * VA Linux Systems Japan K.K.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/pci.h>
+
+#include <asm/setup.h>
+
+static char pci_reserve_param[COMMAND_LINE_SIZE];
+
+/* pci_reserve= [PCI]
+ * Format: [<sbdf>[+IO<size>][+MEM<size>]][,<sbdf>...]
+ * Format of sbdf: [<segment>:]<bus>:<dev>.<func>
+ */
+static int pci_reserve_parse_size(const char *str,
+ unsigned long *io_size,
+ unsigned long *mem_size)
+{
+ if (sscanf(str, "io%lx", io_size) == 1 ||
+ sscanf(str, "IO%lx", io_size) == 1)
+ return 0;
+
+ if (sscanf(str, "mem%lx", mem_size) == 1 ||
+ sscanf(str, "MEM%lx", mem_size) == 1)
+ return 0;
+
+ return -EINVAL;
+}
+
+static int pci_reserve_parse_one(const char *str,
+ int *seg, int *bus, int *dev, int *func,
+ unsigned long *io_size,
+ unsigned long *mem_size)
+{
+ char *p;
+
+ *io_size = 0;
+ *mem_size = 0;
+
+ if (sscanf(str, "%x:%x:%x.%x", seg, bus, dev, func) != 4) {
+ *seg = 0;
+ if (sscanf(str, "%x:%x.%x", bus, dev, func) != 3) {
+ return -EINVAL;
+ }
+ }
+
+ p = strchr(str, '+');
+ if (p == NULL)
+ return -EINVAL;
+ if (pci_reserve_parse_size(++p, io_size, mem_size))
+ return -EINVAL;
+
+ p = strchr(p, '+');
+ return p ? pci_reserve_parse_size(p + 1, io_size, mem_size) : 0;
+}
+
+static unsigned long pci_reserve_size(struct pci_bus *pbus, int flags)
+{
+ char *sp;
+ char *ep;
+
+ int seg;
+ int bus;
+ int dev;
+ int func;
+
+ unsigned long io_size;
+ unsigned long mem_size;
+
+ sp = pci_reserve_param;
+
+ do {
+ ep = strchr(sp, ',');
+ if (ep)
+ *ep = '\0'; /* chomp */
+
+ if (pci_reserve_parse_one(sp, &seg, &bus, &dev, &func,
+ &io_size, &mem_size) == 0) {
+ if (pci_domain_nr(pbus) == seg &&
+ pbus->number == bus &&
+ PCI_SLOT(pbus->self->devfn) == dev &&
+ PCI_FUNC(pbus->self->devfn) == func) {
+ switch (flags) {
+ case IORESOURCE_IO:
+ return io_size;
+ case IORESOURCE_MEM:
+ return mem_size;
+ default:
+ break;
+ }
+ }
+ }
+
+ if (ep) {
+ *ep = ','; /* restore chomp'ed ',' for later */
+ ep++;
+ }
+ sp = ep;
+ } while (ep);
+
+ return 0;
+}
+
+unsigned long pci_reserve_size_io(struct pci_bus *pbus)
+{
+ return pci_reserve_size(pbus, IORESOURCE_IO);
+}
+
+unsigned long pci_reserve_size_mem(struct pci_bus *pbus)
+{
+ return pci_reserve_size(pbus, IORESOURCE_MEM);
+}
+
+static int __init pci_reserve_setup(char *str)
+{
+ if (strlen(str) >= sizeof(pci_reserve_param))
+ return 0;
+ strlcpy(pci_reserve_param, str, sizeof(pci_reserve_param));
+ return 1;
+}
+__setup("pci_reserve=", pci_reserve_setup);
--- sle11sp1-2010-03-11.orig/drivers/pci/setup-bus.c 2010-03-11 09:10:12.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/pci/setup-bus.c 2010-03-11 09:12:00.000000000 +0100
@@ -337,7 +337,7 @@ static void pbus_size_io(struct pci_bus
#if defined(CONFIG_ISA) || defined(CONFIG_EISA)
size = (size & 0xff) + ((size & ~0xffUL) << 2);
#endif
- size = ALIGN(size + size1, 4096);
+ size = ALIGN(max(size + size1, pci_reserve_size_io(bus)), 4096);
if (!size) {
b_res->flags = 0;
return;
@@ -417,7 +417,8 @@ static int pbus_size_mem(struct pci_bus
min_align = align1 >> 1;
align += aligns[order];
}
- size = ALIGN(size, min_align);
+ size = ALIGN(max(size, (resource_size_t)pci_reserve_size_mem(bus)),
+ min_align);
if (!size) {
b_res->flags = 0;
return 1;

1133
sfc-driverlink Normal file

File diff suppressed because it is too large Load diff

248
sfc-driverlink-conditional Normal file
View file

@ -0,0 +1,248 @@
From: jbeulich@novell.com
Subject: conditionalize driverlink additions to Solarflare driver
Patch-mainline: obsolete
References: FATE#303479
At once converted the EFX_TRACE() invocations after vetoed RX/TX
callbacks to ...LOG() ones, which is consistent with Solarflare's
current code according to David Riddoch (2008-09-12).
--- head-2009-11-06.orig/drivers/net/sfc/Kconfig 2009-04-21 11:02:22.000000000 +0200
+++ head-2009-11-06/drivers/net/sfc/Kconfig 2009-10-12 13:41:03.000000000 +0200
@@ -12,8 +12,12 @@ config SFC
To compile this driver as a module, choose M here. The module
will be called sfc.
+config SFC_DRIVERLINK
+ bool
+
config SFC_RESOURCE
depends on SFC && X86
+ select SFC_DRIVERLINK
tristate "Solarflare Solarstorm SFC4000 resource driver"
help
This module provides the SFC resource manager driver.
--- head-2009-11-06.orig/drivers/net/sfc/Makefile 2009-02-06 12:42:18.000000000 +0100
+++ head-2009-11-06/drivers/net/sfc/Makefile 2009-10-12 13:41:03.000000000 +0200
@@ -1,7 +1,7 @@
sfc-y += efx.o falcon.o tx.o rx.o falcon_gmac.o \
falcon_xmac.o selftest.o ethtool.o xfp_phy.o \
- mdio_10g.o tenxpress.o boards.o sfe4001.o \
- driverlink.o
+ mdio_10g.o tenxpress.o boards.o sfe4001.o
+sfc-$(CONFIG_SFC_DRIVERLINK) += driverlink.o
sfc-$(CONFIG_SFC_MTD) += mtd.o
obj-$(CONFIG_SFC) += sfc.o
--- head-2009-11-06.orig/drivers/net/sfc/driverlink.c 2009-07-28 10:04:25.000000000 +0200
+++ head-2009-11-06/drivers/net/sfc/driverlink.c 2009-10-12 13:41:03.000000000 +0200
@@ -14,7 +14,6 @@
#include <linux/rtnetlink.h>
#include "net_driver.h"
#include "efx.h"
-#include "driverlink_api.h"
#include "driverlink.h"
/* Protects @efx_driverlink_lock and @efx_driver_list */
--- head-2009-11-06.orig/drivers/net/sfc/driverlink.h 2009-07-28 10:04:25.000000000 +0200
+++ head-2009-11-06/drivers/net/sfc/driverlink.h 2009-10-12 13:41:03.000000000 +0200
@@ -15,6 +15,10 @@
struct efx_dl_device;
struct efx_nic;
+#ifdef CONFIG_SFC_DRIVERLINK
+
+#include "driverlink_api.h"
+
/* Efx callback devices
*
* A list of the devices that own each callback. The partner to
@@ -40,4 +44,23 @@ extern void efx_dl_unregister_nic(struct
extern void efx_dl_reset_suspend(struct efx_nic *efx);
extern void efx_dl_reset_resume(struct efx_nic *efx, int ok);
+#define EFX_DL_LOG EFX_LOG
+
+#else /* CONFIG_SFC_DRIVERLINK */
+
+enum efx_veto { EFX_ALLOW_PACKET = 0 };
+
+static inline int efx_nop_callback(struct efx_nic *efx) { return 0; }
+#define EFX_DL_CALLBACK(port, name, ...) efx_nop_callback(port)
+
+static inline int efx_dl_register_nic(struct efx_nic *efx) { return 0; }
+static inline void efx_dl_unregister_nic(struct efx_nic *efx) {}
+
+static inline void efx_dl_reset_suspend(struct efx_nic *efx) {}
+static inline void efx_dl_reset_resume(struct efx_nic *efx, int ok) {}
+
+#define EFX_DL_LOG(efx, fmt, args...) ((void)(efx))
+
+#endif /* CONFIG_SFC_DRIVERLINK */
+
#endif /* EFX_DRIVERLINK_H */
--- head-2009-11-06.orig/drivers/net/sfc/efx.c 2009-10-12 13:40:25.000000000 +0200
+++ head-2009-11-06/drivers/net/sfc/efx.c 2009-10-12 13:41:03.000000000 +0200
@@ -1689,6 +1689,7 @@ static void efx_unregister_netdev(struct
* Device reset and suspend
*
**************************************************************************/
+#ifdef CONFIG_SFC_DRIVERLINK
/* Serialise access to the driverlink callbacks, by quiescing event processing
* (without flushing the descriptor queues), and acquiring the rtnl_lock */
void efx_suspend(struct efx_nic *efx)
@@ -1706,6 +1707,7 @@ void efx_resume(struct efx_nic *efx)
efx_start_all(efx);
rtnl_unlock();
}
+#endif
/* Tears down the entire software state and most of the hardware state
* before reset. */
@@ -1978,9 +1980,11 @@ static int efx_init_struct(struct efx_ni
efx->mac_op = &efx_dummy_mac_operations;
efx->phy_op = &efx_dummy_phy_operations;
efx->mdio.dev = net_dev;
+#ifdef CONFIG_SFC_DRIVERLINK
INIT_LIST_HEAD(&efx->dl_node);
INIT_LIST_HEAD(&efx->dl_device_list);
efx->dl_cb = efx_default_callbacks;
+#endif
INIT_WORK(&efx->phy_work, efx_phy_work);
INIT_WORK(&efx->mac_work, efx_mac_work);
atomic_set(&efx->netif_stop_count, 1);
--- head-2009-11-06.orig/drivers/net/sfc/falcon.c 2009-07-28 10:04:25.000000000 +0200
+++ head-2009-11-06/drivers/net/sfc/falcon.c 2009-10-12 13:41:03.000000000 +0200
@@ -36,6 +36,7 @@
/**
* struct falcon_nic_data - Falcon NIC state
+ * @next_buffer_table: First available buffer table id
* @resources: Resource information for driverlink client
* @pci_dev2: The secondary PCI device if present
* @i2c_data: Operations and state for I2C bit-bashing algorithm
@@ -43,7 +44,11 @@
* @int_error_expire: Time at which error count will be expired
*/
struct falcon_nic_data {
+#ifndef CONFIG_SFC_DRIVERLINK
+ unsigned next_buffer_table;
+#else
struct efx_dl_falcon_resources resources;
+#endif
struct pci_dev *pci_dev2;
struct i2c_algo_bit_data i2c_data;
@@ -336,8 +341,13 @@ static int falcon_alloc_special_buffer(s
memset(buffer->addr, 0xff, len);
/* Select new buffer ID */
+#ifndef CONFIG_SFC_DRIVERLINK
+ buffer->index = nic_data->next_buffer_table;
+ nic_data->next_buffer_table += buffer->entries;
+#else
buffer->index = nic_data->resources.buffer_table_min;
nic_data->resources.buffer_table_min += buffer->entries;
+#endif
EFX_LOG(efx, "allocating special buffers %d-%d at %llx+%x "
"(virt %p phys %llx)\n", buffer->index,
@@ -2755,6 +2765,7 @@ static int falcon_probe_nvconfig(struct
* should live. */
static int falcon_dimension_resources(struct efx_nic *efx)
{
+#ifdef CONFIG_SFC_DRIVERLINK
unsigned internal_dcs_entries;
struct falcon_nic_data *nic_data = efx->nic_data;
struct efx_dl_falcon_resources *res = &nic_data->resources;
@@ -2799,6 +2810,7 @@ static int falcon_dimension_resources(st
if (EFX_INT_MODE_USE_MSI(efx))
res->flags |= EFX_DL_FALCON_USE_MSI;
+#endif
return 0;
}
@@ -2962,7 +2974,9 @@ int falcon_probe_nic(struct efx_nic *efx
return 0;
fail6:
+#ifdef CONFIG_SFC_DRIVERLINK
efx->dl_info = NULL;
+#endif
fail5:
falcon_remove_spi_devices(efx);
falcon_free_buffer(efx, &efx->irq_status);
@@ -3150,7 +3164,9 @@ void falcon_remove_nic(struct efx_nic *e
/* Tear down the private nic state */
kfree(efx->nic_data);
efx->nic_data = NULL;
+#ifdef CONFIG_SFC_DRIVERLINK
efx->dl_info = NULL;
+#endif
}
void falcon_update_nic_stats(struct efx_nic *efx)
--- head-2009-11-06.orig/drivers/net/sfc/net_driver.h 2009-07-28 10:04:25.000000000 +0200
+++ head-2009-11-06/drivers/net/sfc/net_driver.h 2009-10-12 13:41:03.000000000 +0200
@@ -29,7 +29,6 @@
#include "enum.h"
#include "bitfield.h"
-#include "driverlink_api.h"
#include "driverlink.h"
/**************************************************************************
@@ -854,11 +853,13 @@ struct efx_nic {
void *loopback_selftest;
const char *silicon_rev;
+#ifdef CONFIG_SFC_DRIVERLINK
struct efx_dl_device_info *dl_info;
struct list_head dl_node;
struct list_head dl_device_list;
struct efx_dl_callbacks dl_cb;
struct efx_dl_cb_devices dl_cb_dev;
+#endif
};
static inline int efx_dev_registered(struct efx_nic *efx)
--- head-2009-11-06.orig/drivers/net/sfc/rx.c 2009-11-06 10:32:03.000000000 +0100
+++ head-2009-11-06/drivers/net/sfc/rx.c 2009-11-06 10:32:24.000000000 +0100
@@ -456,8 +456,8 @@ static void efx_rx_packet_lro(struct efx
* an obvious interface to this, so veto packets before LRO */
veto = EFX_DL_CALLBACK(efx, rx_packet, rx_buf->data, rx_buf->len);
if (unlikely(veto)) {
- EFX_TRACE(efx, "LRO RX vetoed by driverlink %s driver\n",
- efx->dl_cb_dev.rx_packet->driver->name);
+ EFX_DL_LOG(efx, "LRO RX vetoed by driverlink %s driver\n",
+ efx->dl_cb_dev.rx_packet->driver->name);
/* Free the buffer now */
efx_free_rx_buffer(efx, rx_buf);
return;
@@ -579,8 +579,8 @@ void __efx_rx_packet(struct efx_channel
/* Allow callback to veto the packet */
veto = EFX_DL_CALLBACK(efx, rx_packet, rx_buf->data, rx_buf->len);
if (unlikely(veto)) {
- EFX_LOG(efx, "RX vetoed by driverlink %s driver\n",
- efx->dl_cb_dev.rx_packet->driver->name);
+ EFX_DL_LOG(efx, "RX vetoed by driverlink %s driver\n",
+ efx->dl_cb_dev.rx_packet->driver->name);
/* Free the buffer now */
efx_free_rx_buffer(efx, rx_buf);
goto done;
--- head-2009-11-06.orig/drivers/net/sfc/tx.c 2009-10-12 13:40:32.000000000 +0200
+++ head-2009-11-06/drivers/net/sfc/tx.c 2009-10-12 13:41:03.000000000 +0200
@@ -387,9 +387,9 @@ netdev_tx_t efx_hard_start_xmit(struct s
/* See if driverlink wants to veto the packet. */
veto = EFX_DL_CALLBACK(efx, tx_packet, skb);
if (unlikely(veto)) {
- EFX_TRACE(efx, "TX queue %d packet vetoed by "
- "driverlink %s driver\n", tx_queue->queue,
- efx->dl_cb_dev.tx_packet->driver->name);
+ EFX_DL_LOG(efx, "TX queue %d packet vetoed by "
+ "driverlink %s driver\n", tx_queue->queue,
+ efx->dl_cb_dev.tx_packet->driver->name);
/* Free the skb; nothing else will do it */
dev_kfree_skb_any(skb);
return NETDEV_TX_OK;

18
sfc-endianness Normal file
View file

@ -0,0 +1,18 @@
From: jbeulich@novell.com
Subject: fix building with gcc 4.4
Patch-mainline: obsolete
--- head-2009-05-19.orig/drivers/net/sfc/sfc_resource/ci/efhw/hardware_sysdep.h 2008-07-17 16:18:07.000000000 +0200
+++ head-2009-05-19/drivers/net/sfc/sfc_resource/ci/efhw/hardware_sysdep.h 2009-05-19 15:44:02.000000000 +0200
@@ -42,9 +42,9 @@
#include <linux/io.h>
-#ifdef __LITTLE_ENDIAN
+#if defined(__LITTLE_ENDIAN)
#define EFHW_IS_LITTLE_ENDIAN
-#elif __BIG_ENDIAN
+#elif defined(__BIG_ENDIAN)
#define EFHW_IS_BIG_ENDIAN
#else
#error Unknown endianness

298
sfc-external-sram Normal file
View file

@ -0,0 +1,298 @@
From: Kieran Mansley <kmansley@solarflare.com>
Subject: enable access to Falcon's external SRAM
References: bnc#489105
Include ability to reference external SRAM on Solarflare Falcon NICs to
allow event queues to be accessed by virtualised guests.
Acked-by: jbeulich@novell.com
--- head-2009-07-28.orig/drivers/net/sfc/falcon.c 2009-07-28 10:05:40.000000000 +0200
+++ head-2009-07-28/drivers/net/sfc/falcon.c 2009-07-28 10:06:53.000000000 +0200
@@ -36,6 +36,9 @@
/**
* struct falcon_nic_data - Falcon NIC state
+ * @sram_cfg: SRAM configuration value
+ * @tx_dc_base: Base address in SRAM of TX queue descriptor caches
+ * @rx_dc_base: Base address in SRAM of RX queue descriptor caches
* @next_buffer_table: First available buffer table id
* @resources: Resource information for driverlink client
* @pci_dev2: The secondary PCI device if present
@@ -44,6 +47,9 @@
* @int_error_expire: Time at which error count will be expired
*/
struct falcon_nic_data {
+ int sram_cfg;
+ unsigned tx_dc_base;
+ unsigned rx_dc_base;
#ifndef CONFIG_SFC_DRIVERLINK
unsigned next_buffer_table;
#else
@@ -74,11 +80,11 @@ static int disable_dma_stats;
*/
#define TX_DC_ENTRIES 16
#define TX_DC_ENTRIES_ORDER 0
-#define TX_DC_BASE 0x130000
+#define TX_DC_INTERNAL_BASE 0x130000
#define RX_DC_ENTRIES 64
#define RX_DC_ENTRIES_ORDER 2
-#define RX_DC_BASE 0x100000
+#define RX_DC_INTERNAL_BASE 0x100000
static const unsigned int
/* "Large" EEPROM device: Atmel AT25640 or similar
@@ -468,9 +474,17 @@ void falcon_push_buffers(struct efx_tx_q
int falcon_probe_tx(struct efx_tx_queue *tx_queue)
{
struct efx_nic *efx = tx_queue->efx;
- return falcon_alloc_special_buffer(efx, &tx_queue->txd,
- FALCON_TXD_RING_SIZE *
- sizeof(efx_qword_t));
+ int rc = falcon_alloc_special_buffer(efx, &tx_queue->txd,
+ FALCON_TXD_RING_SIZE *
+ sizeof(efx_qword_t));
+#ifdef CONFIG_SFC_DRIVERLINK
+ if (rc == 0) {
+ struct falcon_nic_data *nic_data = efx->nic_data;
+ nic_data->resources.txq_min = max(nic_data->resources.txq_min,
+ (unsigned)tx_queue->queue + 1);
+ }
+#endif
+ return rc;
}
void falcon_init_tx(struct efx_tx_queue *tx_queue)
@@ -610,9 +624,17 @@ void falcon_notify_rx_desc(struct efx_rx
int falcon_probe_rx(struct efx_rx_queue *rx_queue)
{
struct efx_nic *efx = rx_queue->efx;
- return falcon_alloc_special_buffer(efx, &rx_queue->rxd,
- FALCON_RXD_RING_SIZE *
- sizeof(efx_qword_t));
+ int rc = falcon_alloc_special_buffer(efx, &rx_queue->rxd,
+ FALCON_RXD_RING_SIZE *
+ sizeof(efx_qword_t));
+#ifdef CONFIG_SFC_DRIVERLINK
+ if (rc == 0) {
+ struct falcon_nic_data *nic_data = efx->nic_data;
+ nic_data->resources.rxq_min = max(nic_data->resources.rxq_min,
+ (unsigned)rx_queue->queue + 1);
+ }
+#endif
+ return rc;
}
void falcon_init_rx(struct efx_rx_queue *rx_queue)
@@ -1120,9 +1142,18 @@ int falcon_probe_eventq(struct efx_chann
{
struct efx_nic *efx = channel->efx;
unsigned int evq_size;
+ int rc;
evq_size = FALCON_EVQ_SIZE * sizeof(efx_qword_t);
- return falcon_alloc_special_buffer(efx, &channel->eventq, evq_size);
+ rc = falcon_alloc_special_buffer(efx, &channel->eventq, evq_size);
+#ifdef CONFIG_SFC_DRIVERLINK
+ if (rc == 0) {
+ struct falcon_nic_data *nic_data = efx->nic_data;
+ nic_data->resources.evq_int_min = max(nic_data->resources.evq_int_min,
+ (unsigned)channel->channel + 1);
+ }
+#endif
+ return rc;
}
void falcon_init_eventq(struct efx_channel *channel)
@@ -2618,19 +2649,22 @@ fail5:
*/
static int falcon_reset_sram(struct efx_nic *efx)
{
+ struct falcon_nic_data *nic_data = efx->nic_data;
efx_oword_t srm_cfg_reg_ker, gpio_cfg_reg_ker;
- int count;
+ int count, onchip, sram_cfg_val;
/* Set the SRAM wake/sleep GPIO appropriately. */
+ onchip = (nic_data->sram_cfg == SRM_NB_BSZ_ONCHIP_ONLY);
falcon_read(efx, &gpio_cfg_reg_ker, GPIO_CTL_REG_KER);
EFX_SET_OWORD_FIELD(gpio_cfg_reg_ker, GPIO1_OEN, 1);
- EFX_SET_OWORD_FIELD(gpio_cfg_reg_ker, GPIO1_OUT, 1);
+ EFX_SET_OWORD_FIELD(gpio_cfg_reg_ker, GPIO1_OUT, onchip);
falcon_write(efx, &gpio_cfg_reg_ker, GPIO_CTL_REG_KER);
/* Initiate SRAM reset */
+ sram_cfg_val = onchip ? 0 : nic_data->sram_cfg;
EFX_POPULATE_OWORD_2(srm_cfg_reg_ker,
SRAM_OOB_BT_INIT_EN, 1,
- SRM_NUM_BANKS_AND_BANK_SIZE, 0);
+ SRM_NUM_BANKS_AND_BANK_SIZE, sram_cfg_val);
falcon_write(efx, &srm_cfg_reg_ker, SRM_CFG_REG_KER);
/* Wait for SRAM reset to complete */
@@ -2702,8 +2736,10 @@ static void falcon_remove_spi_devices(st
/* Extract non-volatile configuration */
static int falcon_probe_nvconfig(struct efx_nic *efx)
{
+ struct falcon_nic_data *nic_data = efx->nic_data;
struct falcon_nvconfig *nvconfig;
int board_rev;
+ bool onchip_sram;
int rc;
nvconfig = kmalloc(sizeof(*nvconfig), GFP_KERNEL);
@@ -2716,6 +2752,7 @@ static int falcon_probe_nvconfig(struct
efx->phy_type = PHY_TYPE_NONE;
efx->mdio.prtad = MDIO_PRTAD_NONE;
board_rev = 0;
+ onchip_sram = true;
rc = 0;
} else if (rc) {
goto fail1;
@@ -2726,6 +2763,13 @@ static int falcon_probe_nvconfig(struct
efx->phy_type = v2->port0_phy_type;
efx->mdio.prtad = v2->port0_phy_addr;
board_rev = le16_to_cpu(v2->board_revision);
+#ifdef CONFIG_SFC_DRIVERLINK
+ onchip_sram = EFX_OWORD_FIELD(nvconfig->nic_stat_reg,
+ ONCHIP_SRAM);
+#else
+ /* We have no use for external SRAM */
+ onchip_sram = true;
+#endif
if (le16_to_cpu(nvconfig->board_struct_ver) >= 3) {
__le32 fl = v3->spi_device_type[EE_SPI_FLASH];
@@ -2750,6 +2794,21 @@ static int falcon_probe_nvconfig(struct
efx_set_board_info(efx, board_rev);
+ /* Read the SRAM configuration. The register is initialised
+ * automatically but might may been reset since boot.
+ */
+ if (onchip_sram) {
+ nic_data->sram_cfg = SRM_NB_BSZ_ONCHIP_ONLY;
+ } else {
+ nic_data->sram_cfg =
+ EFX_OWORD_FIELD(nvconfig->srm_cfg_reg,
+ SRM_NUM_BANKS_AND_BANK_SIZE);
+ WARN_ON(nic_data->sram_cfg == SRM_NB_BSZ_RESERVED);
+ /* Replace invalid setting with the smallest defaults */
+ if (nic_data->sram_cfg == SRM_NB_BSZ_DEFAULT)
+ nic_data->sram_cfg = SRM_NB_BSZ_1BANKS_2M;
+ }
+
kfree(nvconfig);
return 0;
@@ -2765,9 +2824,9 @@ static int falcon_probe_nvconfig(struct
* should live. */
static int falcon_dimension_resources(struct efx_nic *efx)
{
+ struct falcon_nic_data *nic_data = efx->nic_data;
#ifdef CONFIG_SFC_DRIVERLINK
unsigned internal_dcs_entries;
- struct falcon_nic_data *nic_data = efx->nic_data;
struct efx_dl_falcon_resources *res = &nic_data->resources;
/* Fill out the driverlink resource list */
@@ -2800,16 +2859,64 @@ static int falcon_dimension_resources(st
break;
}
- /* Internal SRAM only for now */
- res->rxq_lim = internal_dcs_entries / RX_DC_ENTRIES;
- res->txq_lim = internal_dcs_entries / TX_DC_ENTRIES;
- res->buffer_table_lim = 8192;
+ if (nic_data->sram_cfg == SRM_NB_BSZ_ONCHIP_ONLY) {
+ res->rxq_lim = internal_dcs_entries / RX_DC_ENTRIES;
+ res->txq_lim = internal_dcs_entries / TX_DC_ENTRIES;
+ res->buffer_table_lim = 8192;
+ nic_data->tx_dc_base = TX_DC_INTERNAL_BASE;
+ nic_data->rx_dc_base = RX_DC_INTERNAL_BASE;
+ } else {
+ unsigned sram_bytes, vnic_bytes, max_vnics, n_vnics, dcs;
+
+ /* Determine how much SRAM we have to play with. We have
+ * to fit buffer table and descriptor caches in.
+ */
+ switch (nic_data->sram_cfg) {
+ case SRM_NB_BSZ_1BANKS_2M:
+ default:
+ sram_bytes = 2 * 1024 * 1024;
+ break;
+ case SRM_NB_BSZ_1BANKS_4M:
+ case SRM_NB_BSZ_2BANKS_4M:
+ sram_bytes = 4 * 1024 * 1024;
+ break;
+ case SRM_NB_BSZ_1BANKS_8M:
+ case SRM_NB_BSZ_2BANKS_8M:
+ sram_bytes = 8 * 1024 * 1024;
+ break;
+ case SRM_NB_BSZ_2BANKS_16M:
+ sram_bytes = 16 * 1024 * 1024;
+ break;
+ }
+ /* For each VNIC allow at least 512 buffer table entries
+ * and descriptor cache for an rxq and txq. Buffer table
+ * space for evqs and dmaqs is relatively trivial, so not
+ * considered in this calculation.
+ */
+ vnic_bytes = 512 * 8 + RX_DC_ENTRIES * 8 + TX_DC_ENTRIES * 8;
+ max_vnics = sram_bytes / vnic_bytes;
+ for (n_vnics = 1; n_vnics < res->evq_timer_min + max_vnics;)
+ n_vnics *= 2;
+ res->rxq_lim = n_vnics;
+ res->txq_lim = n_vnics;
+
+ dcs = n_vnics * TX_DC_ENTRIES * 8;
+ nic_data->tx_dc_base = sram_bytes - dcs;
+ dcs = n_vnics * RX_DC_ENTRIES * 8;
+ nic_data->rx_dc_base = nic_data->tx_dc_base - dcs;
+ res->buffer_table_lim = nic_data->rx_dc_base / 8;
+ }
if (FALCON_IS_DUAL_FUNC(efx))
res->flags |= EFX_DL_FALCON_DUAL_FUNC;
if (EFX_INT_MODE_USE_MSI(efx))
res->flags |= EFX_DL_FALCON_USE_MSI;
+#else
+ /* We ignore external SRAM */
+ EFX_BUG_ON_PARANOID(nic_data->sram_cfg != SRM_NB_BSZ_ONCHIP_ONLY);
+ nic_data->tx_dc_base = TX_DC_INTERNAL_BASE;
+ nic_data->rx_dc_base = RX_DC_INTERNAL_BASE;
#endif
return 0;
@@ -2998,13 +3105,15 @@ int falcon_probe_nic(struct efx_nic *efx
*/
int falcon_init_nic(struct efx_nic *efx)
{
+ struct falcon_nic_data *nic_data = efx->nic_data;
efx_oword_t temp;
unsigned thresh;
int rc;
- /* Use on-chip SRAM */
+ /* Use on-chip SRAM if wanted. */
falcon_read(efx, &temp, NIC_STAT_REG);
- EFX_SET_OWORD_FIELD(temp, ONCHIP_SRAM, 1);
+ EFX_SET_OWORD_FIELD(temp, ONCHIP_SRAM,
+ nic_data->sram_cfg == SRM_NB_BSZ_ONCHIP_ONLY);
falcon_write(efx, &temp, NIC_STAT_REG);
/* Set the source of the GMAC clock */
@@ -3023,9 +3132,9 @@ int falcon_init_nic(struct efx_nic *efx)
return rc;
/* Set positions of descriptor caches in SRAM. */
- EFX_POPULATE_OWORD_1(temp, SRM_TX_DC_BASE_ADR, TX_DC_BASE / 8);
+ EFX_POPULATE_OWORD_1(temp, SRM_TX_DC_BASE_ADR, nic_data->tx_dc_base / 8);
falcon_write(efx, &temp, SRM_TX_DC_CFG_REG_KER);
- EFX_POPULATE_OWORD_1(temp, SRM_RX_DC_BASE_ADR, RX_DC_BASE / 8);
+ EFX_POPULATE_OWORD_1(temp, SRM_RX_DC_BASE_ADR, nic_data->rx_dc_base / 8);
falcon_write(efx, &temp, SRM_RX_DC_CFG_REG_KER);
/* Set TX descriptor cache size. */

15052
sfc-resource-driver Normal file

File diff suppressed because it is too large Load diff

38
sfc-set-arch Normal file
View file

@ -0,0 +1,38 @@
From: Kieran Mansley <kmansley@solarflare.com>
Subject: set efhw_arch field of device type
References: bnc#489105
Patch-mainline: n/a
Acked-by: jbeulich@novell.com
--- head-2009-04-07.orig/drivers/net/sfc/sfc_resource/ci/efhw/common.h 2009-04-07 14:39:57.000000000 +0200
+++ head-2009-04-07/drivers/net/sfc/sfc_resource/ci/efhw/common.h 2009-04-07 15:02:05.000000000 +0200
@@ -41,6 +41,10 @@
#include <ci/efhw/common_sysdep.h>
+enum efhw_arch {
+ EFHW_ARCH_FALCON,
+};
+
typedef uint32_t efhw_buffer_addr_t;
#define EFHW_BUFFER_ADDR_FMT "[ba:%"PRIx32"]"
--- head-2009-04-07.orig/drivers/net/sfc/sfc_resource/nic.c 2009-04-07 14:39:57.000000000 +0200
+++ head-2009-04-07/drivers/net/sfc/sfc_resource/nic.c 2009-04-07 15:02:05.000000000 +0200
@@ -47,6 +47,7 @@ int efhw_device_type_init(struct efhw_de
switch (device_id) {
case 0x0703:
case 0x6703:
+ dt->arch = EFHW_ARCH_FALCON;
dt->variant = 'A';
switch (class_revision) {
case 0:
@@ -60,6 +61,7 @@ int efhw_device_type_init(struct efhw_de
}
break;
case 0x0710:
+ dt->arch = EFHW_ARCH_FALCON;
dt->variant = 'B';
switch (class_revision) {
case 2:

1388
tmem Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,210 @@
From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86: Unify fixup_irqs() for 32-bit and 64-bit kernels
References: bnc#558247
Patch-upstream: Yes
Commit 7a7732bc0f7c46f217dbec723f25366b6285cc42 upstream.
There is no reason to have different fixup_irqs() for 32-bit
and 64-bit kernels. Unify by using the superior 64-bit version for
both the kernels.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Rafael J. Wysocki <rjw@suse.de>
---
arch/x86/kernel/irq.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/irq_32.c | 45 -----------------------------------
arch/x86/kernel/irq_64.c | 58 ----------------------------------------------
3 files changed, 59 insertions(+), 103 deletions(-)
Index: linux-2.6.32-master/arch/x86/kernel/irq.c
===================================================================
--- linux-2.6.32-master.orig/arch/x86/kernel/irq.c
+++ linux-2.6.32-master/arch/x86/kernel/irq.c
@@ -274,3 +274,62 @@ void smp_x86_platform_ipi(struct pt_regs
}
EXPORT_SYMBOL_GPL(vector_used_by_percpu_irq);
+
+#ifdef CONFIG_HOTPLUG_CPU
+/* A cpu has been removed from cpu_online_mask. Reset irq affinities. */
+void fixup_irqs(void)
+{
+ unsigned int irq;
+ static int warned;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(irq, desc) {
+ int break_affinity = 0;
+ int set_affinity = 1;
+ const struct cpumask *affinity;
+
+ if (!desc)
+ continue;
+ if (irq == 2)
+ continue;
+
+ /* interrupt's are disabled at this point */
+ spin_lock(&desc->lock);
+
+ affinity = desc->affinity;
+ if (!irq_has_action(irq) ||
+ cpumask_equal(affinity, cpu_online_mask)) {
+ spin_unlock(&desc->lock);
+ continue;
+ }
+
+ if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
+ break_affinity = 1;
+ affinity = cpu_all_mask;
+ }
+
+ if (desc->chip->mask)
+ desc->chip->mask(irq);
+
+ if (desc->chip->set_affinity)
+ desc->chip->set_affinity(irq, affinity);
+ else if (!(warned++))
+ set_affinity = 0;
+
+ if (desc->chip->unmask)
+ desc->chip->unmask(irq);
+
+ spin_unlock(&desc->lock);
+
+ if (break_affinity && set_affinity)
+ printk("Broke affinity for irq %i\n", irq);
+ else if (!set_affinity)
+ printk("Cannot set affinity for irq %i\n", irq);
+ }
+
+ /* That doesn't seem sufficient. Give it 1ms. */
+ local_irq_enable();
+ mdelay(1);
+ local_irq_disable();
+}
+#endif
Index: linux-2.6.32-master/arch/x86/kernel/irq_32.c
===================================================================
--- linux-2.6.32-master.orig/arch/x86/kernel/irq_32.c
+++ linux-2.6.32-master/arch/x86/kernel/irq_32.c
@@ -211,48 +211,3 @@ bool handle_irq(unsigned irq, struct pt_
return true;
}
-
-#ifdef CONFIG_HOTPLUG_CPU
-
-/* A cpu has been removed from cpu_online_mask. Reset irq affinities. */
-void fixup_irqs(void)
-{
- unsigned int irq;
- struct irq_desc *desc;
-
- for_each_irq_desc(irq, desc) {
- const struct cpumask *affinity;
-
- if (!desc)
- continue;
- if (irq == 2)
- continue;
-
- affinity = desc->affinity;
- if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
- printk("Breaking affinity for irq %i\n", irq);
- affinity = cpu_all_mask;
- }
- if (desc->chip->set_affinity)
- desc->chip->set_affinity(irq, affinity);
- else if (desc->action)
- printk_once("Cannot set affinity for irq %i\n", irq);
- }
-
-#if 0
- barrier();
- /* Ingo Molnar says: "after the IO-APIC masks have been redirected
- [note the nop - the interrupt-enable boundary on x86 is two
- instructions from sti] - to flush out pending hardirqs and
- IPIs. After this point nothing is supposed to reach this CPU." */
- __asm__ __volatile__("sti; nop; cli");
- barrier();
-#else
- /* That doesn't seem sufficient. Give it 1ms. */
- local_irq_enable();
- mdelay(1);
- local_irq_disable();
-#endif
-}
-#endif
-
Index: linux-2.6.32-master/arch/x86/kernel/irq_64.c
===================================================================
--- linux-2.6.32-master.orig/arch/x86/kernel/irq_64.c
+++ linux-2.6.32-master/arch/x86/kernel/irq_64.c
@@ -62,64 +62,6 @@ bool handle_irq(unsigned irq, struct pt_
return true;
}
-#ifdef CONFIG_HOTPLUG_CPU
-/* A cpu has been removed from cpu_online_mask. Reset irq affinities. */
-void fixup_irqs(void)
-{
- unsigned int irq;
- static int warned;
- struct irq_desc *desc;
-
- for_each_irq_desc(irq, desc) {
- int break_affinity = 0;
- int set_affinity = 1;
- const struct cpumask *affinity;
-
- if (!desc)
- continue;
- if (irq == 2)
- continue;
-
- /* interrupt's are disabled at this point */
- spin_lock(&desc->lock);
-
- affinity = desc->affinity;
- if (!irq_has_action(irq) ||
- cpumask_equal(affinity, cpu_online_mask)) {
- spin_unlock(&desc->lock);
- continue;
- }
-
- if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
- break_affinity = 1;
- affinity = cpu_all_mask;
- }
-
- if (desc->chip->mask)
- desc->chip->mask(irq);
-
- if (desc->chip->set_affinity)
- desc->chip->set_affinity(irq, affinity);
- else if (!(warned++))
- set_affinity = 0;
-
- if (desc->chip->unmask)
- desc->chip->unmask(irq);
-
- spin_unlock(&desc->lock);
-
- if (break_affinity && set_affinity)
- printk("Broke affinity for irq %i\n", irq);
- else if (!set_affinity)
- printk("Cannot set affinity for irq %i\n", irq);
- }
-
- /* That doesn't seem sufficient. Give it 1ms. */
- local_irq_enable();
- mdelay(1);
- local_irq_disable();
-}
-#endif
extern void call_softirq(void);

View file

@ -0,0 +1,96 @@
From: Borislav Petkov <petkovbb@googlemail.com>
Subject: x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes
Patch-mainline: 2.6.33-rc1
References: bnc#564618, FATE#306952
Git-commit: 27c13ecec4d8856687b50b959e1146845b478f95
display_cacheinfo() doesn't display anything anymore and it is used to
detect CPU cache sizes. Rename it accordingly.
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <20091121130145.GA31357@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Rafael J. Wysocki <rjw@suse.de>
---
arch/x86/kernel/cpu/amd.c | 2 +-
arch/x86/kernel/cpu/centaur.c | 2 +-
arch/x86/kernel/cpu/common.c | 4 ++--
arch/x86/kernel/cpu/cpu.h | 2 +-
arch/x86/kernel/cpu/cyrix.c | 2 +-
arch/x86/kernel/cpu/transmeta.c | 2 +-
6 files changed, 7 insertions(+), 7 deletions(-)
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -535,7 +535,7 @@ static void __cpuinit init_amd(struct cp
}
}
- display_cacheinfo(c);
+ cpu_detect_cache_sizes(c);
/* Multi core CPU? */
if (c->extended_cpuid_level >= 0x80000008) {
--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -294,7 +294,7 @@ static void __cpuinit init_c3(struct cpu
set_cpu_cap(c, X86_FEATURE_REP_GOOD);
}
- display_cacheinfo(c);
+ cpu_detect_cache_sizes(c);
}
enum {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -61,7 +61,7 @@ void __init setup_cpu_local_masks(void)
static void __cpuinit default_init(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_X86_64
- display_cacheinfo(c);
+ cpu_detect_cache_sizes(c);
#else
/* Not much we can do here... */
/* Check if at least it has cpuid */
@@ -383,7 +383,7 @@ static void __cpuinit get_model_name(str
}
}
-void __cpuinit display_cacheinfo(struct cpuinfo_x86 *c)
+void __cpuinit cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
{
unsigned int n, dummy, ebx, ecx, edx, l2size;
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -32,6 +32,6 @@ struct cpu_dev {
extern const struct cpu_dev *const __x86_cpu_dev_start[],
*const __x86_cpu_dev_end[];
-extern void display_cacheinfo(struct cpuinfo_x86 *c);
+extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
#endif
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -373,7 +373,7 @@ static void __cpuinit init_nsc(struct cp
/* Handle the GX (Formally known as the GX2) */
if (c->x86 == 5 && c->x86_model == 5)
- display_cacheinfo(c);
+ cpu_detect_cache_sizes(c);
else
init_cyrix(c);
}
--- a/arch/x86/kernel/cpu/transmeta.c
+++ b/arch/x86/kernel/cpu/transmeta.c
@@ -26,7 +26,7 @@ static void __cpuinit init_transmeta(str
early_init_transmeta(c);
- display_cacheinfo(c);
+ cpu_detect_cache_sizes(c);
/* Print CMS and CPU revision */
max = cpuid_eax(0x80860000);

4762
x86_64_defconfig-server Normal file

File diff suppressed because it is too large Load diff

78
xen-balloon-max-target Normal file
View file

@ -0,0 +1,78 @@
From: ccoffing@novell.com
Subject: Expose min/max limits of domain ballooning
Patch-mainline: obsolete
References: 152667, 184727
jb: Also added this to the sysfs representation.
--- sle11sp1-2010-02-02.orig/drivers/xen/balloon/balloon.c 2010-02-02 14:56:27.000000000 +0100
+++ sle11sp1-2010-02-02/drivers/xen/balloon/balloon.c 2010-02-02 15:08:54.000000000 +0100
@@ -239,7 +239,7 @@ static unsigned long current_target(void
return target;
}
-static unsigned long minimum_target(void)
+unsigned long balloon_minimum_target(void)
{
#ifndef CONFIG_XEN
#define max_pfn num_physpages
@@ -461,7 +461,7 @@ static void balloon_process(struct work_
void balloon_set_new_target(unsigned long target)
{
/* No need for lock. Not read-modify-write updates. */
- bs.target_pages = max(target, minimum_target());
+ bs.target_pages = max(target, balloon_minimum_target());
schedule_work(&balloon_worker);
}
@@ -536,10 +536,13 @@ static int balloon_read(char *page, char
page,
"Current allocation: %8lu kB\n"
"Requested target: %8lu kB\n"
+ "Minimum target: %8lu kB\n"
+ "Maximum target: %8lu kB\n"
"Low-mem balloon: %8lu kB\n"
"High-mem balloon: %8lu kB\n"
"Driver pages: %8lu kB\n",
PAGES2KB(bs.current_pages), PAGES2KB(bs.target_pages),
+ PAGES2KB(balloon_minimum_target()), PAGES2KB(num_physpages),
PAGES2KB(bs.balloon_low), PAGES2KB(bs.balloon_high),
PAGES2KB(bs.driver_pages));
--- sle11sp1-2010-02-02.orig/drivers/xen/balloon/common.h 2009-06-09 15:01:37.000000000 +0200
+++ sle11sp1-2010-02-02/drivers/xen/balloon/common.h 2009-08-19 10:36:49.000000000 +0200
@@ -52,5 +52,6 @@ int balloon_sysfs_init(void);
void balloon_sysfs_exit(void);
void balloon_set_new_target(unsigned long target);
+unsigned long balloon_minimum_target(void);
#endif /* __XEN_BALLOON_COMMON_H__ */
--- sle11sp1-2010-02-02.orig/drivers/xen/balloon/sysfs.c 2009-11-06 10:51:55.000000000 +0100
+++ sle11sp1-2010-02-02/drivers/xen/balloon/sysfs.c 2009-08-19 10:36:47.000000000 +0200
@@ -31,6 +31,7 @@
#include <linux/capability.h>
#include <linux/errno.h>
#include <linux/init.h>
+#include <linux/mm.h>
#include <linux/stat.h>
#include <linux/string.h>
#include <linux/sysdev.h>
@@ -53,6 +54,8 @@
static SYSDEV_ATTR(name, S_IRUGO, show_##name, NULL)
BALLOON_SHOW(current_kb, "%lu\n", PAGES2KB(bs.current_pages));
+BALLOON_SHOW(min_kb, "%lu\n", PAGES2KB(balloon_minimum_target()));
+BALLOON_SHOW(max_kb, "%lu\n", PAGES2KB(num_physpages));
BALLOON_SHOW(low_kb, "%lu\n", PAGES2KB(bs.balloon_low));
BALLOON_SHOW(high_kb, "%lu\n", PAGES2KB(bs.balloon_high));
BALLOON_SHOW(driver_kb, "%lu\n", PAGES2KB(bs.driver_pages));
@@ -123,6 +126,8 @@ static struct sysdev_attribute *balloon_
static struct attribute *balloon_info_attrs[] = {
&attr_current_kb.attr,
+ &attr_min_kb.attr,
+ &attr_max_kb.attr,
&attr_low_kb.attr,
&attr_high_kb.attr,
&attr_driver_kb.attr,

39
xen-blkback-bimodal-suse Normal file
View file

@ -0,0 +1,39 @@
Subject: backward compatibility
From: Gerd Hoffmann <kraxel@suse.de>
Patch-mainline: obsolete
---
linux-2.6-xen-sparse/drivers/xen/blkback/xenbus.c | 6 ++++++
linux-2.6-xen-sparse/drivers/xen/blktap/xenbus.c | 6 ++++++
2 files changed, 12 insertions(+)
--- sle11sp1-2010-03-22.orig/drivers/xen/blkback/xenbus.c 2010-03-22 12:26:08.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkback/xenbus.c 2010-03-22 12:53:24.000000000 +0100
@@ -500,6 +500,12 @@ static int connect_ring(struct backend_i
be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+#if 1 /* maintain compatibility with early sles10-sp1 and paravirt netware betas */
+ else if (0 == strcmp(protocol, "1"))
+ be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
+ else if (0 == strcmp(protocol, "2"))
+ be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+#endif
else {
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -1;
--- sle11sp1-2010-03-22.orig/drivers/xen/blktap/xenbus.c 2010-01-27 14:48:30.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blktap/xenbus.c 2010-01-27 14:59:26.000000000 +0100
@@ -440,6 +440,12 @@ static int connect_ring(struct backend_i
be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+#if 1 /* maintain compatibility with early sles10-sp1 and paravirt netware betas */
+ else if (0 == strcmp(protocol, "1"))
+ be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
+ else if (0 == strcmp(protocol, "2"))
+ be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
+#endif
else {
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -1;

233
xen-blkback-cdrom Normal file
View file

@ -0,0 +1,233 @@
Subject: CDROM removable media-present attribute plus handling code
From: plc@novell.com
Patch-mainline: obsolete
References: 159907
--- sle11sp1-2010-03-22.orig/drivers/xen/blkback/Makefile 2009-06-09 15:01:37.000000000 +0200
+++ sle11sp1-2010-03-22/drivers/xen/blkback/Makefile 2009-06-09 15:50:31.000000000 +0200
@@ -1,4 +1,4 @@
obj-$(CONFIG_XEN_BLKDEV_BACKEND) := blkbk.o
obj-$(CONFIG_XEN_BLKBACK_PAGEMAP) += blkback-pagemap.o
-blkbk-y := blkback.o xenbus.o interface.o vbd.o
+blkbk-y := blkback.o xenbus.o interface.o vbd.o cdrom.o
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ sle11sp1-2010-03-22/drivers/xen/blkback/cdrom.c 2010-03-22 12:54:41.000000000 +0100
@@ -0,0 +1,162 @@
+/******************************************************************************
+ * blkback/cdrom.c
+ *
+ * Routines for managing cdrom watch and media-present attribute of a
+ * cdrom type virtual block device (VBD).
+ *
+ * Copyright (c) 2003-2005, Keir Fraser & Steve Hand
+ * Copyright (c) 2007 Pat Campbell
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "common.h"
+
+#undef DPRINTK
+#define DPRINTK(_f, _a...) \
+ printk("(%s() file=%s, line=%d) " _f "\n", \
+ __PRETTY_FUNCTION__, __FILE__ , __LINE__ , ##_a )
+
+
+#define MEDIA_PRESENT "media-present"
+
+static void cdrom_media_changed(struct xenbus_watch *, const char **, unsigned int);
+
+/**
+ * Writes media-present=1 attribute for the given vbd device if not
+ * already there
+ */
+static int cdrom_xenstore_write_media_present(struct backend_info *be)
+{
+ struct xenbus_device *dev = be->dev;
+ struct xenbus_transaction xbt;
+ int err;
+ int media_present;
+
+ err = xenbus_scanf(XBT_NIL, dev->nodename, MEDIA_PRESENT, "%d",
+ &media_present);
+ if (0 < err) {
+ DPRINTK("already written err%d", err);
+ return(0);
+ }
+ media_present = 1;
+
+again:
+ err = xenbus_transaction_start(&xbt);
+ if (err) {
+ xenbus_dev_fatal(dev, err, "starting transaction");
+ return(-1);
+ }
+
+ err = xenbus_printf(xbt, dev->nodename, MEDIA_PRESENT, "%d", media_present );
+ if (err) {
+ xenbus_dev_fatal(dev, err, "writing %s/%s",
+ dev->nodename, MEDIA_PRESENT);
+ goto abort;
+ }
+ err = xenbus_transaction_end(xbt, 0);
+ if (err == -EAGAIN)
+ goto again;
+ if (err)
+ xenbus_dev_fatal(dev, err, "ending transaction");
+ return 0;
+ abort:
+ xenbus_transaction_end(xbt, 1);
+ return -1;
+}
+
+/**
+ *
+ */
+static int cdrom_is_type(struct backend_info *be)
+{
+ DPRINTK("type:%x", be->blkif->vbd.type );
+ return (be->blkif->vbd.type & VDISK_CDROM)
+ && (be->blkif->vbd.type & GENHD_FL_REMOVABLE);
+}
+
+/**
+ *
+ */
+void cdrom_add_media_watch(struct backend_info *be)
+{
+ struct xenbus_device *dev = be->dev;
+ int err;
+
+ DPRINTK("nodename:%s", dev->nodename);
+ if (cdrom_is_type(be)) {
+ DPRINTK("is a cdrom");
+ if ( cdrom_xenstore_write_media_present(be) == 0 ) {
+ DPRINTK( "xenstore wrote OK");
+ err = xenbus_watch_path2(dev, dev->nodename, MEDIA_PRESENT,
+ &be->cdrom_watch,
+ cdrom_media_changed);
+ if (err)
+ DPRINTK( "media_present watch add failed" );
+ }
+ }
+}
+
+/**
+ * Callback received when the "media_present" xenstore node is changed
+ */
+static void cdrom_media_changed(struct xenbus_watch *watch,
+ const char **vec, unsigned int len)
+{
+ int err;
+ unsigned media_present;
+ struct backend_info *be
+ = container_of(watch, struct backend_info, cdrom_watch);
+ struct xenbus_device *dev = be->dev;
+
+ if (!cdrom_is_type(be)) {
+ DPRINTK("callback not for a cdrom" );
+ return;
+ }
+
+ err = xenbus_scanf(XBT_NIL, dev->nodename, MEDIA_PRESENT, "%d",
+ &media_present);
+ if (err == 0 || err == -ENOENT) {
+ DPRINTK("xenbus_read of cdrom media_present node error:%d",err);
+ return;
+ }
+
+ if (media_present == 0)
+ vbd_free(&be->blkif->vbd);
+ else {
+ char *p = strrchr(dev->otherend, '/') + 1;
+ long handle = simple_strtoul(p, NULL, 0);
+
+ if (!be->blkif->vbd.bdev) {
+ err = vbd_create(be->blkif, handle, be->major, be->minor,
+ !strchr(be->mode, 'w'), 1);
+ if (err) {
+ be->major = be->minor = 0;
+ xenbus_dev_fatal(dev, err, "creating vbd structure");
+ return;
+ }
+ }
+ }
+}
--- sle11sp1-2010-03-22.orig/drivers/xen/blkback/common.h 2010-03-22 12:20:19.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkback/common.h 2010-03-22 12:54:11.000000000 +0100
@@ -106,6 +106,7 @@ struct backend_info
struct xenbus_device *dev;
blkif_t *blkif;
struct xenbus_watch backend_watch;
+ struct xenbus_watch cdrom_watch;
unsigned major;
unsigned minor;
char *mode;
@@ -152,4 +153,7 @@ int blkif_schedule(void *arg);
int blkback_barrier(struct xenbus_transaction xbt,
struct backend_info *be, int state);
+/* cdrom media change */
+void cdrom_add_media_watch(struct backend_info *be);
+
#endif /* __BLKIF__BACKEND__COMMON_H__ */
--- sle11sp1-2010-03-22.orig/drivers/xen/blkback/vbd.c 2009-11-06 10:52:09.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkback/vbd.c 2010-03-22 12:53:45.000000000 +0100
@@ -108,6 +108,9 @@ int vbd_translate(struct phys_req *req,
if ((operation != READ) && vbd->readonly)
goto out;
+ if (vbd->bdev == NULL)
+ goto out;
+
if (unlikely((req->sector_number + req->nr_sects) > vbd_sz(vbd)))
goto out;
--- sle11sp1-2010-03-22.orig/drivers/xen/blkback/xenbus.c 2010-03-22 12:53:34.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkback/xenbus.c 2010-03-22 12:54:58.000000000 +0100
@@ -187,6 +187,12 @@ static int blkback_remove(struct xenbus_
be->backend_watch.node = NULL;
}
+ if (be->cdrom_watch.node) {
+ unregister_xenbus_watch(&be->cdrom_watch);
+ kfree(be->cdrom_watch.node);
+ be->cdrom_watch.node = NULL;
+ }
+
if (be->blkif) {
blkif_disconnect(be->blkif);
vbd_free(&be->blkif->vbd);
@@ -343,6 +349,9 @@ static void backend_changed(struct xenbu
/* We're potentially connected now */
update_blkif_status(be->blkif);
+
+ /* Add watch for cdrom media status if necessay */
+ cdrom_add_media_watch(be);
}
}

711
xen-blkfront-cdrom Normal file
View file

@ -0,0 +1,711 @@
From: plc@novell.com
Subject: implement forwarding of CD-ROM specific commands
Patch-mainline: obsolete
References: fate#300964
--- sle11sp1-2010-03-22.orig/drivers/cdrom/Makefile 2010-03-22 12:07:53.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/cdrom/Makefile 2009-10-15 12:13:13.000000000 +0200
@@ -9,6 +9,7 @@ obj-$(CONFIG_BLK_DEV_IDECD) +=
obj-$(CONFIG_BLK_DEV_SR) += cdrom.o
obj-$(CONFIG_PARIDE_PCD) += cdrom.o
obj-$(CONFIG_CDROM_PKTCDVD) += cdrom.o
+obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += cdrom.o
obj-$(CONFIG_VIOCD) += viocd.o cdrom.o
obj-$(CONFIG_GDROM) += gdrom.o cdrom.o
--- sle11sp1-2010-03-22.orig/drivers/xen/blkfront/Makefile 2007-06-12 13:13:44.000000000 +0200
+++ sle11sp1-2010-03-22/drivers/xen/blkfront/Makefile 2009-10-15 12:13:13.000000000 +0200
@@ -1,5 +1,5 @@
obj-$(CONFIG_XEN_BLKDEV_FRONTEND) := xenblk.o
-xenblk-objs := blkfront.o vbd.o
+xenblk-objs := blkfront.o vbd.o vcd.o
--- sle11sp1-2010-03-22.orig/drivers/xen/blkfront/blkfront.c 2010-03-22 12:57:12.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkfront/blkfront.c 2010-03-22 12:57:16.000000000 +0100
@@ -395,6 +395,8 @@ static void connect(struct blkfront_info
add_disk(info->gd);
info->is_ready = 1;
+
+ register_vcd(info);
}
/**
@@ -424,6 +426,8 @@ static void blkfront_closing(struct blkf
xlvbd_sysfs_delif(info);
+ unregister_vcd(info);
+
xlvbd_del(info);
out:
--- sle11sp1-2010-03-22.orig/drivers/xen/blkfront/block.h 2010-01-18 16:49:13.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkfront/block.h 2010-01-18 17:06:20.000000000 +0100
@@ -163,4 +163,8 @@ static inline void xlvbd_sysfs_delif(str
}
#endif
+/* Virtual cdrom block-device */
+extern void register_vcd(struct blkfront_info *info);
+extern void unregister_vcd(struct blkfront_info *info);
+
#endif /* __XEN_DRIVERS_BLOCK_H__ */
--- sle11sp1-2010-03-22.orig/drivers/xen/blkfront/vbd.c 2010-01-18 16:54:56.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkfront/vbd.c 2010-01-18 17:06:22.000000000 +0100
@@ -370,7 +370,8 @@ xlvbd_add(blkif_sector_t capacity, int v
goto out;
info->mi = mi;
- if ((minor & ((1 << mi->type->partn_shift) - 1)) == 0)
+ if (!(vdisk_info & VDISK_CDROM) &&
+ (minor & ((1 << mi->type->partn_shift) - 1)) == 0)
nr_minors = 1 << mi->type->partn_shift;
err = xlbd_reserve_minors(mi, minor, nr_minors);
@@ -384,7 +385,7 @@ xlvbd_add(blkif_sector_t capacity, int v
offset = mi->index * mi->type->disks_per_major +
(minor >> mi->type->partn_shift);
- if (nr_minors > 1) {
+ if (nr_minors > 1 || (vdisk_info & VDISK_CDROM)) {
if (offset < 26) {
sprintf(gd->disk_name, "%s%c",
mi->type->diskname, 'a' + offset );
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ sle11sp1-2010-03-22/drivers/xen/blkfront/vcd.c 2010-02-09 17:17:54.000000000 +0100
@@ -0,0 +1,509 @@
+/*******************************************************************************
+ * vcd.c
+ *
+ * Implements CDROM cmd packet passing between frontend guest and backend driver.
+ *
+ * Copyright (c) 2008, Pat Campell plc@novell.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#define REVISION "$Revision: 1.0 $"
+
+#include <linux/module.h>
+#include <linux/blkdev.h>
+#include <linux/list.h>
+#include <linux/cdrom.h>
+#include <xen/interface/io/cdromif.h>
+#include "block.h"
+
+/* List of cdrom_device_info, can have as many as blkfront supports */
+struct vcd_disk {
+ struct list_head vcd_entry;
+ struct cdrom_device_info vcd_cdrom_info;
+ spinlock_t vcd_cdrom_info_lock;
+};
+static LIST_HEAD(vcd_disks);
+static DEFINE_SPINLOCK(vcd_disks_lock);
+
+static struct vcd_disk *xencdrom_get_list_entry(struct gendisk *disk)
+{
+ struct vcd_disk *ret_vcd = NULL;
+ struct vcd_disk *vcd;
+
+ spin_lock(&vcd_disks_lock);
+ list_for_each_entry(vcd, &vcd_disks, vcd_entry) {
+ if (vcd->vcd_cdrom_info.disk == disk) {
+ spin_lock(&vcd->vcd_cdrom_info_lock);
+ ret_vcd = vcd;
+ break;
+ }
+ }
+ spin_unlock(&vcd_disks_lock);
+ return ret_vcd;
+}
+
+static void submit_message(struct blkfront_info *info, void *sp)
+{
+ struct request *req = NULL;
+
+ req = blk_get_request(info->rq, READ, __GFP_WAIT);
+ if (blk_rq_map_kern(info->rq, req, sp, PAGE_SIZE, __GFP_WAIT))
+ goto out;
+
+ req->rq_disk = info->gd;
+#if LINUX_VERSION_CODE > KERNEL_VERSION(2,6,18)
+ req->cmd_type = REQ_TYPE_BLOCK_PC;
+ req->cmd_flags |= REQ_NOMERGE;
+#else
+ req->flags |= REQ_BLOCK_PC;
+#endif
+ req->__sector = 0;
+ req->__data_len = PAGE_SIZE;
+ req->timeout = 60*HZ;
+
+ blk_execute_rq(req->q, info->gd, req, 1);
+
+out:
+ blk_put_request(req);
+}
+
+static int submit_cdrom_cmd(struct blkfront_info *info,
+ struct packet_command *cgc)
+{
+ int ret = 0;
+ struct page *page;
+ size_t size;
+ union xen_block_packet *sp;
+ struct xen_cdrom_packet *xcp;
+ struct vcd_generic_command *vgc;
+
+ if (cgc->buffer && cgc->buflen > MAX_PACKET_DATA) {
+ printk(KERN_WARNING "%s() Packet buffer length is to large \n", __func__);
+ return -EIO;
+ }
+
+ page = alloc_page(GFP_NOIO);
+ if (!page) {
+ printk(KERN_CRIT "%s() Unable to allocate page\n", __func__);
+ return -ENOMEM;
+ }
+
+ size = PAGE_SIZE;
+ memset(page_address(page), 0, PAGE_SIZE);
+ sp = page_address(page);
+ xcp = &(sp->xcp);
+ xcp->type = XEN_TYPE_CDROM_PACKET;
+ xcp->payload_offset = PACKET_PAYLOAD_OFFSET;
+
+ vgc = (struct vcd_generic_command *)((char *)sp + xcp->payload_offset);
+ memcpy(vgc->cmd, cgc->cmd, CDROM_PACKET_SIZE);
+ vgc->stat = cgc->stat;
+ vgc->data_direction = cgc->data_direction;
+ vgc->quiet = cgc->quiet;
+ vgc->timeout = cgc->timeout;
+ if (cgc->sense) {
+ vgc->sense_offset = PACKET_SENSE_OFFSET;
+ memcpy((char *)sp + vgc->sense_offset, cgc->sense, sizeof(struct request_sense));
+ }
+ if (cgc->buffer) {
+ vgc->buffer_offset = PACKET_BUFFER_OFFSET;
+ memcpy((char *)sp + vgc->buffer_offset, cgc->buffer, cgc->buflen);
+ vgc->buflen = cgc->buflen;
+ }
+
+ submit_message(info,sp);
+
+ if (xcp->ret)
+ ret = xcp->err;
+
+ if (cgc->sense) {
+ memcpy(cgc->sense, (char *)sp + PACKET_SENSE_OFFSET, sizeof(struct request_sense));
+ }
+ if (cgc->buffer && cgc->buflen) {
+ memcpy(cgc->buffer, (char *)sp + PACKET_BUFFER_OFFSET, cgc->buflen);
+ }
+
+ __free_page(page);
+ return ret;
+}
+
+
+static int xencdrom_open(struct cdrom_device_info *cdi, int purpose)
+{
+ int ret = 0;
+ struct page *page;
+ struct blkfront_info *info;
+ union xen_block_packet *sp;
+ struct xen_cdrom_open *xco;
+
+ info = cdi->disk->private_data;
+
+ if (!info->xbdev)
+ return -ENODEV;
+
+ if (strlen(info->xbdev->otherend) > MAX_PACKET_DATA) {
+ return -EIO;
+ }
+
+ page = alloc_page(GFP_NOIO);
+ if (!page) {
+ printk(KERN_CRIT "%s() Unable to allocate page\n", __func__);
+ return -ENOMEM;
+ }
+
+ memset(page_address(page), 0, PAGE_SIZE);
+ sp = page_address(page);
+ xco = &(sp->xco);
+ xco->type = XEN_TYPE_CDROM_OPEN;
+ xco->payload_offset = sizeof(struct xen_cdrom_open);
+ strcpy((char *)sp + xco->payload_offset, info->xbdev->otherend);
+
+ submit_message(info,sp);
+
+ if (xco->ret) {
+ ret = xco->err;
+ goto out;
+ }
+
+ if (xco->media_present)
+ set_capacity(cdi->disk, xco->sectors);
+
+out:
+ __free_page(page);
+ return ret;
+}
+
+static void xencdrom_release(struct cdrom_device_info *cdi)
+{
+}
+
+static int xencdrom_media_changed(struct cdrom_device_info *cdi, int disc_nr)
+{
+ int ret;
+ struct page *page;
+ struct blkfront_info *info;
+ union xen_block_packet *sp;
+ struct xen_cdrom_media_changed *xcmc;
+
+ info = cdi->disk->private_data;
+
+ page = alloc_page(GFP_NOIO);
+ if (!page) {
+ printk(KERN_CRIT "%s() Unable to allocate page\n", __func__);
+ return -ENOMEM;
+ }
+
+ memset(page_address(page), 0, PAGE_SIZE);
+ sp = page_address(page);
+ xcmc = &(sp->xcmc);
+ xcmc->type = XEN_TYPE_CDROM_MEDIA_CHANGED;
+ submit_message(info,sp);
+ ret = xcmc->media_changed;
+
+ __free_page(page);
+
+ return ret;
+}
+
+static int xencdrom_tray_move(struct cdrom_device_info *cdi, int position)
+{
+ int ret;
+ struct packet_command cgc;
+ struct blkfront_info *info;
+
+ info = cdi->disk->private_data;
+ init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE);
+ cgc.cmd[0] = GPCMD_START_STOP_UNIT;
+ if (position)
+ cgc.cmd[4] = 2;
+ else
+ cgc.cmd[4] = 3;
+ ret = submit_cdrom_cmd(info, &cgc);
+ return ret;
+}
+
+static int xencdrom_lock_door(struct cdrom_device_info *cdi, int lock)
+{
+ int ret = 0;
+ struct blkfront_info *info;
+ struct packet_command cgc;
+
+ info = cdi->disk->private_data;
+ init_cdrom_command(&cgc, NULL, 0, CGC_DATA_NONE);
+ cgc.cmd[0] = GPCMD_PREVENT_ALLOW_MEDIUM_REMOVAL;
+ cgc.cmd[4] = lock;
+ ret = submit_cdrom_cmd(info, &cgc);
+ return ret;
+}
+
+static int xencdrom_packet(struct cdrom_device_info *cdi,
+ struct packet_command *cgc)
+{
+ int ret = -EIO;
+ struct blkfront_info *info;
+
+ info = cdi->disk->private_data;
+ ret = submit_cdrom_cmd(info, cgc);
+ cgc->stat = ret;
+ return ret;
+}
+
+static int xencdrom_audio_ioctl(struct cdrom_device_info *cdi, unsigned int cmd,
+ void *arg)
+{
+ return -EINVAL;
+}
+
+/* Query backend to see if CDROM packets are supported */
+static int xencdrom_supported(struct blkfront_info *info)
+{
+ struct page *page;
+ union xen_block_packet *sp;
+ struct xen_cdrom_support *xcs;
+
+ page = alloc_page(GFP_NOIO);
+ if (!page) {
+ printk(KERN_CRIT "%s() Unable to allocate page\n", __func__);
+ return -ENOMEM;
+ }
+
+ memset(page_address(page), 0, PAGE_SIZE);
+ sp = page_address(page);
+ xcs = &(sp->xcs);
+ xcs->type = XEN_TYPE_CDROM_SUPPORT;
+ submit_message(info,sp);
+ return xcs->supported;
+}
+
+static struct cdrom_device_ops xencdrom_dops = {
+ .open = xencdrom_open,
+ .release = xencdrom_release,
+ .media_changed = xencdrom_media_changed,
+ .tray_move = xencdrom_tray_move,
+ .lock_door = xencdrom_lock_door,
+ .generic_packet = xencdrom_packet,
+ .audio_ioctl = xencdrom_audio_ioctl,
+ .capability = (CDC_CLOSE_TRAY | CDC_OPEN_TRAY | CDC_LOCK | \
+ CDC_MEDIA_CHANGED | CDC_GENERIC_PACKET | CDC_DVD | \
+ CDC_CD_R),
+ .n_minors = 1,
+};
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
+static int xencdrom_block_open(struct inode *inode, struct file *file)
+{
+ struct block_device *bd = inode->i_bdev;
+#else
+static int xencdrom_block_open(struct block_device *bd, fmode_t mode)
+{
+#endif
+ struct blkfront_info *info = bd->bd_disk->private_data;
+ struct vcd_disk *vcd;
+ int ret = 0;
+
+ if (!info->xbdev)
+ return -ENODEV;
+
+ if ((vcd = xencdrom_get_list_entry(info->gd))) {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
+ ret = cdrom_open(&vcd->vcd_cdrom_info, inode, file);
+#else
+ ret = cdrom_open(&vcd->vcd_cdrom_info, bd, mode);
+#endif
+ info->users = vcd->vcd_cdrom_info.use_count;
+ spin_unlock(&vcd->vcd_cdrom_info_lock);
+ }
+ return ret;
+}
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
+static int xencdrom_block_release(struct inode *inode, struct file *file)
+{
+ struct gendisk *gd = inode->i_bdev->bd_disk;
+#else
+static int xencdrom_block_release(struct gendisk *gd, fmode_t mode)
+{
+#endif
+ struct blkfront_info *info = gd->private_data;
+ struct vcd_disk *vcd;
+ int ret = 0;
+
+ if ((vcd = xencdrom_get_list_entry(info->gd))) {
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
+ ret = cdrom_release(&vcd->vcd_cdrom_info, file);
+#else
+ cdrom_release(&vcd->vcd_cdrom_info, mode);
+#endif
+ spin_unlock(&vcd->vcd_cdrom_info_lock);
+ if (vcd->vcd_cdrom_info.use_count == 0) {
+ info->users = 1;
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
+ blkif_release(inode, file);
+#else
+ blkif_release(gd, mode);
+#endif
+ }
+ }
+ return ret;
+}
+
+#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,28)
+static int xencdrom_block_ioctl(struct inode *inode, struct file *file,
+ unsigned cmd, unsigned long arg)
+{
+ struct block_device *bd = inode->i_bdev;
+#else
+static int xencdrom_block_ioctl(struct block_device *bd, fmode_t mode,
+ unsigned cmd, unsigned long arg)
+{
+#endif
+ struct blkfront_info *info = bd->bd_disk->private_data;
+ struct vcd_disk *vcd;
+ int ret = 0;
+
+ if (!(vcd = xencdrom_get_list_entry(info->gd)))
+ goto out;
+
+ switch (cmd) {
+ case 2285: /* SG_IO */
+ ret = -ENOSYS;
+ break;
+ case CDROMEJECT:
+ ret = xencdrom_tray_move(&vcd->vcd_cdrom_info, 1);
+ break;
+ case CDROMCLOSETRAY:
+ ret = xencdrom_tray_move(&vcd->vcd_cdrom_info, 0);
+ break;
+ case CDROM_GET_CAPABILITY:
+ ret = vcd->vcd_cdrom_info.ops->capability & ~vcd->vcd_cdrom_info.mask;
+ break;
+ case CDROM_SET_OPTIONS:
+ ret = vcd->vcd_cdrom_info.options;
+ break;
+ case CDROM_SEND_PACKET:
+ ret = submit_cdrom_cmd(info, (struct packet_command *)arg);
+ break;
+ default:
+ /* Not supported, augment supported above if necessary */
+ printk("%s():%d Unsupported IOCTL:%x \n", __func__, __LINE__, cmd);
+ ret = -ENOTTY;
+ break;
+ }
+ spin_unlock(&vcd->vcd_cdrom_info_lock);
+out:
+ return ret;
+}
+
+/* Called as result of cdrom_open, vcd_cdrom_info_lock already held */
+static int xencdrom_block_media_changed(struct gendisk *disk)
+{
+ struct vcd_disk *vcd;
+ struct vcd_disk *ret_vcd = NULL;
+ int ret = 0;
+
+ spin_lock(&vcd_disks_lock);
+ list_for_each_entry(vcd, &vcd_disks, vcd_entry) {
+ if (vcd->vcd_cdrom_info.disk == disk) {
+ ret_vcd = vcd;
+ break;
+ }
+ }
+ spin_unlock(&vcd_disks_lock);
+ if (ret_vcd) {
+ ret = cdrom_media_changed(&ret_vcd->vcd_cdrom_info);
+ }
+ return ret;
+}
+
+static const struct block_device_operations xencdrom_bdops =
+{
+ .owner = THIS_MODULE,
+ .open = xencdrom_block_open,
+ .release = xencdrom_block_release,
+ .ioctl = xencdrom_block_ioctl,
+ .media_changed = xencdrom_block_media_changed,
+};
+
+void register_vcd(struct blkfront_info *info)
+{
+ struct gendisk *gd = info->gd;
+ struct vcd_disk *vcd;
+
+ /* Make sure this is for a CD device */
+ if (!(gd->flags & GENHD_FL_CD))
+ goto out;
+
+ /* Make sure we have backend support */
+ if (!xencdrom_supported(info)) {
+ goto out;
+ }
+
+ /* Create new vcd_disk and fill in cdrom_info */
+ vcd = (struct vcd_disk *)kzalloc(sizeof(struct vcd_disk), GFP_KERNEL);
+ if (!vcd) {
+ printk(KERN_INFO "%s(): Unable to allocate vcd struct!\n", __func__);
+ goto out;
+ }
+ spin_lock_init(&vcd->vcd_cdrom_info_lock);
+
+ vcd->vcd_cdrom_info.ops = &xencdrom_dops;
+ vcd->vcd_cdrom_info.speed = 4;
+ vcd->vcd_cdrom_info.capacity = 1;
+ vcd->vcd_cdrom_info.options = 0;
+ strcpy(vcd->vcd_cdrom_info.name, gd->disk_name);
+ vcd->vcd_cdrom_info.mask = (CDC_CD_RW | CDC_DVD_R | CDC_DVD_RAM |
+ CDC_SELECT_DISC | CDC_SELECT_SPEED |
+ CDC_MRW | CDC_MRW_W | CDC_RAM);
+
+ if (register_cdrom(&(vcd->vcd_cdrom_info)) != 0) {
+ printk(KERN_WARNING "%s() Cannot register blkdev as a cdrom %d!\n", __func__,
+ gd->major);
+ goto err_out;
+ }
+ gd->fops = &xencdrom_bdops;
+ vcd->vcd_cdrom_info.disk = gd;
+
+ spin_lock(&vcd_disks_lock);
+ list_add(&(vcd->vcd_entry), &vcd_disks);
+ spin_unlock(&vcd_disks_lock);
+out:
+ return;
+err_out:
+ kfree(vcd);
+}
+
+void unregister_vcd(struct blkfront_info *info) {
+ struct gendisk *gd = info->gd;
+ struct vcd_disk *vcd;
+
+ spin_lock(&vcd_disks_lock);
+ list_for_each_entry(vcd, &vcd_disks, vcd_entry) {
+ if (vcd->vcd_cdrom_info.disk == gd) {
+ spin_lock(&vcd->vcd_cdrom_info_lock);
+ unregister_cdrom(&vcd->vcd_cdrom_info);
+ list_del(&vcd->vcd_entry);
+ spin_unlock(&vcd->vcd_cdrom_info_lock);
+ kfree(vcd);
+ break;
+ }
+ }
+ spin_unlock(&vcd_disks_lock);
+}
+
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ sle11sp1-2010-03-22/include/xen/interface/io/cdromif.h 2009-10-15 12:13:13.000000000 +0200
@@ -0,0 +1,120 @@
+/******************************************************************************
+ * cdromif.h
+ *
+ * Shared definitions between backend driver and Xen guest Virtual CDROM
+ * block device.
+ *
+ * Copyright (c) 2008, Pat Campell plc@novell.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __XEN_PUBLIC_IO_CDROMIF_H__
+#define __XEN_PUBLIC_IO_CDROMIF_H__
+
+/*
+ * Queries backend for CDROM support
+ */
+#define XEN_TYPE_CDROM_SUPPORT _IO('c', 1)
+
+struct xen_cdrom_support
+{
+ uint32_t type;
+ int8_t ret; /* returned, 0 succeded, -1 error */
+ int8_t err; /* returned, backend errno */
+ int8_t supported; /* returned, 1 supported */
+};
+
+/*
+ * Opens backend device, returns drive geometry or
+ * any encountered errors
+ */
+#define XEN_TYPE_CDROM_OPEN _IO('c', 2)
+
+struct xen_cdrom_open
+{
+ uint32_t type;
+ int8_t ret;
+ int8_t err;
+ int8_t pad;
+ int8_t media_present; /* returned */
+ uint32_t sectors; /* returned */
+ uint32_t sector_size; /* returned */
+ int32_t payload_offset; /* offset to backend node name payload */
+};
+
+/*
+ * Queries backend for media changed status
+ */
+#define XEN_TYPE_CDROM_MEDIA_CHANGED _IO('c', 3)
+
+struct xen_cdrom_media_changed
+{
+ uint32_t type;
+ int8_t ret;
+ int8_t err;
+ int8_t media_changed; /* returned */
+};
+
+/*
+ * Sends vcd generic CDROM packet to backend, followed
+ * immediately by the vcd_generic_command payload
+ */
+#define XEN_TYPE_CDROM_PACKET _IO('c', 4)
+
+struct xen_cdrom_packet
+{
+ uint32_t type;
+ int8_t ret;
+ int8_t err;
+ int8_t pad[2];
+ int32_t payload_offset; /* offset to vcd_generic_command payload */
+};
+
+/* CDROM_PACKET_COMMAND, payload for XEN_TYPE_CDROM_PACKET */
+struct vcd_generic_command
+{
+ uint8_t cmd[CDROM_PACKET_SIZE];
+ uint8_t pad[4];
+ uint32_t buffer_offset;
+ uint32_t buflen;
+ int32_t stat;
+ uint32_t sense_offset;
+ uint8_t data_direction;
+ uint8_t pad1[3];
+ int32_t quiet;
+ int32_t timeout;
+};
+
+union xen_block_packet
+{
+ uint32_t type;
+ struct xen_cdrom_support xcs;
+ struct xen_cdrom_open xco;
+ struct xen_cdrom_media_changed xcmc;
+ struct xen_cdrom_packet xcp;
+};
+
+#define PACKET_PAYLOAD_OFFSET (sizeof(struct xen_cdrom_packet))
+#define PACKET_SENSE_OFFSET (PACKET_PAYLOAD_OFFSET + sizeof(struct vcd_generic_command))
+#define PACKET_BUFFER_OFFSET (PACKET_SENSE_OFFSET + sizeof(struct request_sense))
+#define MAX_PACKET_DATA (PAGE_SIZE - sizeof(struct xen_cdrom_packet) - \
+ sizeof(struct vcd_generic_command) - sizeof(struct request_sense))
+
+#endif

View file

@ -0,0 +1,219 @@
Subject: 32-on-64 blkif protocol negotiation fallback for old guests.
From: kraxel@suse.de
References: 244055
Patch-mainline: never.
See the comment below. Oh well.
--- sle11sp1-2010-03-29.orig/drivers/xen/Kconfig 2010-03-26 08:39:39.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/Kconfig 2010-03-29 09:12:44.000000000 +0200
@@ -29,6 +29,9 @@ config XEN_PRIVCMD
def_bool y
depends on PROC_FS
+config XEN_DOMCTL
+ tristate
+
config XEN_XENBUS_DEV
def_bool y
depends on PROC_FS
@@ -48,6 +51,7 @@ config XEN_BLKDEV_BACKEND
tristate "Block-device backend driver"
depends on XEN_BACKEND
default XEN_BACKEND
+ select XEN_DOMCTL
help
The block-device backend driver allows the kernel to export its
block devices to other guests via a high-performance shared-memory
@@ -57,6 +61,7 @@ config XEN_BLKDEV_TAP
tristate "Block-device tap backend driver"
depends on XEN_BACKEND
default XEN_BACKEND
+ select XEN_DOMCTL
help
The block tap driver is an alternative to the block back driver
and allows VM block requests to be redirected to userspace through
--- sle11sp1-2010-03-29.orig/drivers/xen/blkback/xenbus.c 2010-03-22 12:53:24.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/blkback/xenbus.c 2010-03-22 12:53:34.000000000 +0100
@@ -21,6 +21,7 @@
#include <linux/module.h>
#include <linux/kthread.h>
#include "common.h"
+#include "../core/domctl.h"
#undef DPRINTK
#define DPRINTK(fmt, args...) \
@@ -492,8 +493,10 @@ static int connect_ring(struct backend_i
be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
"%63s", protocol, NULL);
- if (err)
- strcpy(protocol, "unspecified, assuming native");
+ if (err) {
+ strcpy(protocol, "unspecified");
+ be->blkif->blk_protocol = xen_guest_blkif_protocol(be->blkif->domid);
+ }
else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
--- sle11sp1-2010-03-29.orig/drivers/xen/blktap/xenbus.c 2010-01-27 14:59:26.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/blktap/xenbus.c 2010-01-27 15:00:09.000000000 +0100
@@ -39,6 +39,7 @@
#include <linux/kthread.h>
#include <xen/xenbus.h>
#include "common.h"
+#include "../core/domctl.h"
struct backend_info
@@ -432,8 +433,10 @@ static int connect_ring(struct backend_i
be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
err = xenbus_gather(XBT_NIL, dev->otherend, "protocol",
"%63s", protocol, NULL);
- if (err)
- strcpy(protocol, "unspecified, assuming native");
+ if (err) {
+ strcpy(protocol, "unspecified");
+ be->blkif->blk_protocol = xen_guest_blkif_protocol(be->blkif->domid);
+ }
else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
--- sle11sp1-2010-03-29.orig/drivers/xen/core/Makefile 2009-11-06 10:52:02.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/Makefile 2010-01-04 16:17:00.000000000 +0100
@@ -12,4 +12,7 @@ obj-$(CONFIG_XEN_SYSFS) += xen_sysfs.o
obj-$(CONFIG_XEN_SMPBOOT) += smpboot.o
obj-$(CONFIG_SMP) += spinlock.o
obj-$(CONFIG_KEXEC) += machine_kexec.o
+obj-$(CONFIG_XEN_DOMCTL) += domctl.o
+CFLAGS_domctl.o := -D__XEN_PUBLIC_XEN_H__ -D__XEN_PUBLIC_GRANT_TABLE_H__
+CFLAGS_domctl.o += -D__XEN_TOOLS__ -imacros xen/interface/domctl.h
obj-$(CONFIG_XEN_XENCOMM) += xencomm.o
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ sle11sp1-2010-03-29/drivers/xen/core/domctl.c 2010-01-04 16:15:58.000000000 +0100
@@ -0,0 +1,120 @@
+/*
+ * !!! dirty hack alert !!!
+ *
+ * Problem: old guests kernels don't have a "protocol" node
+ * in the frontend xenstore directory, so mixing
+ * 32 and 64bit domains doesn't work.
+ *
+ * Upstream plans to solve this in the tools, by letting them
+ * create a protocol node. Which certainly makes sense.
+ * But it isn't trivial and isn't done yet. Too bad.
+ *
+ * So for the time being we use the get_address_size domctl
+ * hypercall for a pretty good guess. Not nice as the domctl
+ * hypercall isn't supposed to be used by the kernel. Because
+ * we don't want to have dependencies between dom0 kernel and
+ * xen kernel versions. Now we have one. Ouch.
+ */
+#undef __XEN_PUBLIC_XEN_H__
+#undef __XEN_PUBLIC_GRANT_TABLE_H__
+#undef __XEN_TOOLS__
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <asm/hypervisor.h>
+#include <xen/blkif.h>
+
+#include "domctl.h"
+
+/* stuff copied from xen/interface/domctl.h, which we can't
+ * include directly for the reasons outlined above .... */
+
+typedef struct xen_domctl_address_size {
+ uint32_t size;
+} xen_domctl_address_size_t;
+
+typedef __attribute__((aligned(8))) uint64_t uint64_aligned_t;
+
+union xen_domctl {
+ /* v4: sles10 sp1: xen 3.0.4 + 32-on-64 patches */
+ struct {
+ uint32_t cmd;
+ uint32_t interface_version;
+ domid_t domain;
+ union {
+ /* left out lots of other struct xen_domctl_foobar */
+ struct xen_domctl_address_size address_size;
+ uint64_t dummy_align;
+ uint8_t dummy_pad[128];
+ };
+ } v4;
+
+ /* v5: upstream: xen 3.1, v6: upstream: xen 4.0 */
+ struct {
+ uint32_t cmd;
+ uint32_t interface_version;
+ domid_t domain;
+ union {
+ struct xen_domctl_address_size address_size;
+ uint64_aligned_t dummy_align;
+ uint8_t dummy_pad[128];
+ };
+ } v5, v6;
+};
+
+/* The actual code comes here */
+
+static inline int hypervisor_domctl(void *domctl)
+{
+ return _hypercall1(int, domctl, domctl);
+}
+
+int xen_guest_address_size(int domid)
+{
+ union xen_domctl domctl;
+ int low, ret;
+
+#define guest_address_size(ver) do { \
+ memset(&domctl, 0, sizeof(domctl)); \
+ domctl.v##ver.cmd = XEN_DOMCTL_get_address_size; \
+ domctl.v##ver.interface_version = low = ver; \
+ domctl.v##ver.domain = domid; \
+ ret = hypervisor_domctl(&domctl) ?: domctl.v##ver.address_size.size; \
+ if (ret == 32 || ret == 64) { \
+ printk("v" #ver " domctl worked ok: dom%d is %d-bit\n", \
+ domid, ret); \
+ return ret; \
+ } \
+} while (0)
+
+ BUILD_BUG_ON(XEN_DOMCTL_INTERFACE_VERSION > 6);
+ guest_address_size(6);
+#if CONFIG_XEN_COMPAT < 0x040000
+ guest_address_size(5);
+#endif
+#if CONFIG_XEN_COMPAT < 0x030100
+ guest_address_size(4);
+#endif
+
+ ret = BITS_PER_LONG;
+ printk("v%d...6 domctls failed, assuming dom%d is native: %d\n",
+ low, domid, ret);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(xen_guest_address_size);
+
+int xen_guest_blkif_protocol(int domid)
+{
+ int address_size = xen_guest_address_size(domid);
+
+ if (address_size == BITS_PER_LONG)
+ return BLKIF_PROTOCOL_NATIVE;
+ if (address_size == 32)
+ return BLKIF_PROTOCOL_X86_32;
+ if (address_size == 64)
+ return BLKIF_PROTOCOL_X86_64;
+ return BLKIF_PROTOCOL_NATIVE;
+}
+EXPORT_SYMBOL_GPL(xen_guest_blkif_protocol);
+
+MODULE_LICENSE("GPL");
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ sle11sp1-2010-03-29/drivers/xen/core/domctl.h 2008-09-15 15:10:39.000000000 +0200
@@ -0,0 +1,2 @@
+int xen_guest_address_size(int domid);
+int xen_guest_blkif_protocol(int domid);

105
xen-blktap-write-barriers Normal file
View file

@ -0,0 +1,105 @@
From: kwolf@suse.de
Subject: blktap: Write Barriers
Patch-mainline: obsolete
--- sle11sp1-2010-01-27.orig/drivers/xen/blktap/blktap.c 2010-01-04 12:41:47.000000000 +0100
+++ sle11sp1-2010-01-27/drivers/xen/blktap/blktap.c 2010-01-04 13:22:24.000000000 +0100
@@ -1366,6 +1366,9 @@ static int do_block_io_op(blkif_t *blkif
dispatch_rw_block_io(blkif, &req, pending_req);
break;
+ case BLKIF_OP_WRITE_BARRIER:
+ /* TODO Some counter? */
+ /* Fall through */
case BLKIF_OP_WRITE:
blkif->st_wr_req++;
dispatch_rw_block_io(blkif, &req, pending_req);
@@ -1397,7 +1400,7 @@ static void dispatch_rw_block_io(blkif_t
pending_req_t *pending_req)
{
extern void ll_rw_block(int rw, int nr, struct buffer_head * bhs[]);
- int op, operation = (req->operation == BLKIF_OP_WRITE) ? WRITE : READ;
+ int op, operation;
struct gnttab_map_grant_ref map[BLKIF_MAX_SEGMENTS_PER_REQUEST*2];
unsigned int nseg;
int ret, i, nr_sects = 0;
@@ -1409,6 +1412,21 @@ static void dispatch_rw_block_io(blkif_t
struct mm_struct *mm;
struct vm_area_struct *vma = NULL;
+ switch (req->operation) {
+ case BLKIF_OP_READ:
+ operation = READ;
+ break;
+ case BLKIF_OP_WRITE:
+ operation = WRITE;
+ break;
+ case BLKIF_OP_WRITE_BARRIER:
+ operation = WRITE_BARRIER;
+ break;
+ default:
+ operation = 0; /* make gcc happy */
+ BUG();
+ }
+
if (blkif->dev_num < 0 || blkif->dev_num > MAX_TAP_DEV)
goto fail_response;
@@ -1448,7 +1466,7 @@ static void dispatch_rw_block_io(blkif_t
pending_req->blkif = blkif;
pending_req->id = req->id;
- pending_req->operation = operation;
+ pending_req->operation = req->operation;
pending_req->status = BLKIF_RSP_OKAY;
pending_req->nr_pages = nseg;
op = 0;
@@ -1465,7 +1483,7 @@ static void dispatch_rw_block_io(blkif_t
kvaddr = idx_to_kaddr(mmap_idx, pending_idx, i);
flags = GNTMAP_host_map;
- if (operation == WRITE)
+ if (operation != READ)
flags |= GNTMAP_readonly;
gnttab_set_map_op(&map[op], kvaddr, flags,
req->seg[i].gref, blkif->domid);
@@ -1482,7 +1500,7 @@ static void dispatch_rw_block_io(blkif_t
flags = GNTMAP_host_map | GNTMAP_application_map
| GNTMAP_contains_pte;
- if (operation == WRITE)
+ if (operation != READ)
flags |= GNTMAP_readonly;
gnttab_set_map_op(&map[op], ptep, flags,
req->seg[i].gref, blkif->domid);
--- sle11sp1-2010-01-27.orig/drivers/xen/blktap/xenbus.c 2010-01-27 15:00:09.000000000 +0100
+++ sle11sp1-2010-01-27/drivers/xen/blktap/xenbus.c 2010-01-27 15:00:31.000000000 +0100
@@ -401,7 +401,28 @@ static void connect(struct backend_info
int err;
struct xenbus_device *dev = be->dev;
+ struct xenbus_transaction xbt;
+ /* Write feature-barrier to xenstore */
+again:
+ err = xenbus_transaction_start(&xbt);
+ if (err) {
+ xenbus_dev_fatal(dev, err, "starting transaction");
+ return;
+ }
+
+ err = xenbus_printf(xbt, dev->nodename, "feature-barrier", "1");
+ if (err) {
+ xenbus_dev_fatal(dev, err, "writing feature-barrier");
+ xenbus_transaction_end(xbt, 1);
+ return;
+ }
+
+ err = xenbus_transaction_end(xbt, 0);
+ if (err == -EAGAIN)
+ goto again;
+
+ /* Switch state */
err = xenbus_switch_state(dev, XenbusStateConnected);
if (err)
xenbus_dev_fatal(dev, err, "switching to Connected state",

View file

@ -0,0 +1,74 @@
From: jbeulich@novell.com
Subject: allow number of guest devices to be configurable
Patch-mainline: obsolete
... and derive NR_DYNIRQS from this (rather than having a hard-coded
value).
Similarly, allow the number of simultaneous transmits in netback to be
configurable.
--- sle11sp1-2010-03-29.orig/arch/x86/include/mach-xen/asm/irq_vectors.h 2009-11-06 10:52:09.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/include/mach-xen/asm/irq_vectors.h 2009-12-22 13:21:47.000000000 +0100
@@ -89,7 +89,7 @@ extern int nr_pirqs;
#endif
#define DYNIRQ_BASE (PIRQ_BASE + nr_pirqs)
-#define NR_DYNIRQS 256
+#define NR_DYNIRQS (64 + CONFIG_XEN_NR_GUEST_DEVICES)
#define NR_IRQS (NR_PIRQS + NR_DYNIRQS)
--- sle11sp1-2010-03-29.orig/drivers/xen/Kconfig 2010-03-29 09:13:07.000000000 +0200
+++ sle11sp1-2010-03-29/drivers/xen/Kconfig 2010-03-29 09:13:14.000000000 +0200
@@ -97,6 +97,15 @@ config XEN_NETDEV_BACKEND
network devices to other guests via a high-performance shared-memory
interface.
+config XEN_NETDEV_TX_SHIFT
+ int "Maximum simultaneous transmit requests (as a power of 2)"
+ depends on XEN_NETDEV_BACKEND
+ range 5 16
+ default 8
+ help
+ The maximum number transmits the driver can hold pending, expressed
+ as the exponent of a power of 2.
+
config XEN_NETDEV_PIPELINED_TRANSMITTER
bool "Pipelined transmitter (DANGEROUS)"
depends on XEN_NETDEV_BACKEND
@@ -308,6 +317,16 @@ config XEN_SYSFS
help
Xen hypervisor attributes will show up under /sys/hypervisor/.
+config XEN_NR_GUEST_DEVICES
+ int "Number of guest devices"
+ range 0 4032 if 64BIT
+ range 0 960
+ default 256 if XEN_BACKEND
+ default 16
+ help
+ Specify the total number of virtual devices (i.e. both frontend
+ and backend) that you want the kernel to be able to service.
+
choice
prompt "Xen version compatibility"
default XEN_COMPAT_030002_AND_LATER
--- sle11sp1-2010-03-29.orig/drivers/xen/netback/netback.c 2010-01-04 13:31:26.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/netback/netback.c 2010-01-04 13:31:38.000000000 +0100
@@ -71,7 +71,7 @@ static DECLARE_TASKLET(net_rx_tasklet, n
static struct timer_list net_timer;
static struct timer_list netbk_tx_pending_timer;
-#define MAX_PENDING_REQS 256
+#define MAX_PENDING_REQS (1U << CONFIG_XEN_NETDEV_TX_SHIFT)
static struct sk_buff_head rx_queue;
@@ -1265,6 +1265,7 @@ static void net_tx_action(unsigned long
net_tx_action_dealloc();
mop = tx_map_ops;
+ BUILD_BUG_ON(MAX_SKB_FRAGS >= MAX_PENDING_REQS);
while (((NR_PENDING_REQS + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
!list_empty(&net_schedule_list)) {
/* Get a netif from the list with work to do. */

57
xen-cpufreq-report Normal file
View file

@ -0,0 +1,57 @@
From: jbeulich@novell.com
Subject: make /proc/cpuinfo track CPU speed
Patch-mainline: obsolete
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/acpi/processor_extcntl_xen.c 2010-03-22 12:00:53.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/acpi/processor_extcntl_xen.c 2010-03-22 12:57:38.000000000 +0100
@@ -206,3 +206,14 @@ void arch_acpi_processor_init_extcntl(co
*ops = &xen_extcntl_ops;
}
EXPORT_SYMBOL(arch_acpi_processor_init_extcntl);
+
+unsigned int cpufreq_quick_get(unsigned int cpu)
+{
+ xen_platform_op_t op = {
+ .cmd = XENPF_get_cpu_freq,
+ .interface_version = XENPF_INTERFACE_VERSION,
+ .u.get_cpu_freq.vcpu = cpu
+ };
+
+ return HYPERVISOR_platform_op(&op) == 0 ? op.u.get_cpu_freq.freq : 0;
+}
--- sle11sp1-2010-03-22.orig/include/linux/cpufreq.h 2010-03-22 12:07:53.000000000 +0100
+++ sle11sp1-2010-03-22/include/linux/cpufreq.h 2009-11-06 11:09:27.000000000 +0100
@@ -302,7 +302,7 @@ static inline unsigned int cpufreq_get(u
#endif
/* query the last known CPU freq (in kHz). If zero, cpufreq couldn't detect it */
-#ifdef CONFIG_CPU_FREQ
+#if defined(CONFIG_CPU_FREQ) || defined(CONFIG_PROCESSOR_EXTERNAL_CONTROL)
unsigned int cpufreq_quick_get(unsigned int cpu);
#else
static inline unsigned int cpufreq_quick_get(unsigned int cpu)
--- sle11sp1-2010-03-22.orig/include/xen/interface/platform.h 2010-01-04 11:56:34.000000000 +0100
+++ sle11sp1-2010-03-22/include/xen/interface/platform.h 2010-01-04 13:31:04.000000000 +0100
@@ -355,6 +355,14 @@ struct xenpf_mem_hotadd
uint32_t flags;
};
+#define XENPF_get_cpu_freq ('N' << 24)
+struct xenpf_get_cpu_freq {
+ /* IN variables */
+ uint32_t vcpu;
+ /* OUT variables */
+ uint32_t freq; /* in kHz */
+};
+
struct xen_platform_op {
uint32_t cmd;
uint32_t interface_version; /* XENPF_INTERFACE_VERSION */
@@ -374,6 +382,7 @@ struct xen_platform_op {
struct xenpf_cpu_ol cpu_ol;
struct xenpf_cpu_hotadd cpu_add;
struct xenpf_mem_hotadd mem_add;
+ struct xenpf_get_cpu_freq get_cpu_freq;
uint8_t pad[128];
} u;
};

280
xen-dcdbas Normal file
View file

@ -0,0 +1,280 @@
From: jbeulich@novell.com
Subject: force proper address translation in DCDBAS
Patch-mainline: n/a
The only caveat is that this doesn't work when Dom0 has its vCPU-s pinned.
--- head-2010-01-04.orig/drivers/firmware/Kconfig 2009-11-06 11:10:32.000000000 +0100
+++ head-2010-01-04/drivers/firmware/Kconfig 2009-10-21 12:05:13.000000000 +0200
@@ -90,6 +90,7 @@ config DELL_RBU
config DCDBAS
tristate "Dell Systems Management Base Driver"
depends on X86
+ select XEN_DOMCTL if XEN
help
The Dell Systems Management Base Driver provides a sysfs interface
for systems management software to perform System Management
--- head-2010-01-04.orig/drivers/firmware/dcdbas.c 2010-01-04 16:15:10.000000000 +0100
+++ head-2010-01-04/drivers/firmware/dcdbas.c 2009-10-21 14:18:16.000000000 +0200
@@ -36,6 +36,10 @@
#include <linux/mutex.h>
#include <asm/io.h>
+#ifdef CONFIG_XEN
+#include "../xen/core/domctl.h"
+#endif
+
#include "dcdbas.h"
#define DRIVER_NAME "dcdbas"
@@ -106,7 +110,7 @@ static int smi_data_buf_realloc(unsigned
/* set up new buffer for use */
smi_data_buf = buf;
smi_data_buf_handle = handle;
- smi_data_buf_phys_addr = (u32) virt_to_phys(buf);
+ smi_data_buf_phys_addr = (u32) handle;
smi_data_buf_size = size;
dev_dbg(&dcdbas_pdev->dev, "%s: phys: %x size: %lu\n",
@@ -244,7 +248,9 @@ static ssize_t host_control_on_shutdown_
*/
int dcdbas_smi_request(struct smi_cmd *smi_cmd)
{
+#ifndef CONFIG_XEN
cpumask_var_t old_mask;
+#endif
int ret = 0;
if (smi_cmd->magic != SMI_CMD_MAGIC) {
@@ -254,6 +260,7 @@ int dcdbas_smi_request(struct smi_cmd *s
}
/* SMI requires CPU 0 */
+#ifndef CONFIG_XEN
if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
return -ENOMEM;
@@ -265,6 +272,14 @@ int dcdbas_smi_request(struct smi_cmd *s
ret = -EBUSY;
goto out;
}
+#else
+ ret = xen_set_physical_cpu_affinity(0);
+ if (ret) {
+ dev_dbg(&dcdbas_pdev->dev, "%s: failed (%d) to get CPU 0\n",
+ __func__, ret);
+ return ret;
+ }
+#endif
/* generate SMI */
asm volatile (
@@ -277,9 +292,13 @@ int dcdbas_smi_request(struct smi_cmd *s
: "memory"
);
+#ifndef CONFIG_XEN
out:
set_cpus_allowed_ptr(current, old_mask);
free_cpumask_var(old_mask);
+#else
+ xen_set_physical_cpu_affinity(-1);
+#endif
return ret;
}
@@ -319,7 +338,7 @@ static ssize_t smi_request_store(struct
break;
case 1:
/* Calling Interface SMI */
- smi_cmd->ebx = (u32) virt_to_phys(smi_cmd->command_buffer);
+ smi_cmd->ebx = (u32) virt_to_bus(smi_cmd->command_buffer);
ret = dcdbas_smi_request(smi_cmd);
if (!ret)
ret = count;
@@ -600,6 +619,11 @@ static int __init dcdbas_init(void)
{
int error;
+#ifdef CONFIG_XEN
+ if (!is_initial_xendomain())
+ return -ENODEV;
+#endif
+
error = platform_driver_register(&dcdbas_driver);
if (error)
return error;
--- head-2010-01-04.orig/drivers/xen/core/domctl.c 2010-01-04 16:15:58.000000000 +0100
+++ head-2010-01-04/drivers/xen/core/domctl.c 2010-01-04 16:17:59.000000000 +0100
@@ -20,6 +20,8 @@
#undef __XEN_TOOLS__
#include <linux/kernel.h>
#include <linux/module.h>
+#include <linux/gfp.h>
+#include <linux/percpu.h>
#include <asm/hypervisor.h>
#include <xen/blkif.h>
@@ -34,6 +36,29 @@ typedef struct xen_domctl_address_size {
typedef __attribute__((aligned(8))) uint64_t uint64_aligned_t;
+struct xenctl_cpumap_v4 {
+ XEN_GUEST_HANDLE(uint8) bitmap;
+ uint32_t nr_cpus;
+};
+
+struct xenctl_cpumap_v5 {
+ union {
+ XEN_GUEST_HANDLE(uint8) bitmap;
+ uint64_aligned_t _align;
+ };
+ uint32_t nr_cpus;
+};
+
+struct xen_domctl_vcpuaffinity_v4 {
+ uint32_t vcpu;
+ struct xenctl_cpumap_v4 cpumap;
+};
+
+struct xen_domctl_vcpuaffinity_v5 {
+ uint32_t vcpu;
+ struct xenctl_cpumap_v5 cpumap;
+};
+
union xen_domctl {
/* v4: sles10 sp1: xen 3.0.4 + 32-on-64 patches */
struct {
@@ -43,6 +68,7 @@ union xen_domctl {
union {
/* left out lots of other struct xen_domctl_foobar */
struct xen_domctl_address_size address_size;
+ struct xen_domctl_vcpuaffinity_v4 vcpu_affinity;
uint64_t dummy_align;
uint8_t dummy_pad[128];
};
@@ -55,6 +81,7 @@ union xen_domctl {
domid_t domain;
union {
struct xen_domctl_address_size address_size;
+ struct xen_domctl_vcpuaffinity_v5 vcpu_affinity;
uint64_aligned_t dummy_align;
uint8_t dummy_pad[128];
};
@@ -117,4 +144,110 @@ int xen_guest_blkif_protocol(int domid)
}
EXPORT_SYMBOL_GPL(xen_guest_blkif_protocol);
+#ifdef CONFIG_X86
+
+#define vcpuaffinity(what, ver) ({ \
+ memset(&domctl, 0, sizeof(domctl)); \
+ domctl.v##ver.cmd = XEN_DOMCTL_##what##vcpuaffinity; \
+ domctl.v##ver.interface_version = ver; \
+ /* domctl.v##ver.domain = 0; */ \
+ domctl.v##ver.vcpu_affinity.vcpu = smp_processor_id(); \
+ domctl.v##ver.vcpu_affinity.cpumap.nr_cpus = nr; \
+ set_xen_guest_handle(domctl.v##ver.vcpu_affinity.cpumap.bitmap, \
+ mask); \
+ hypervisor_domctl(&domctl); \
+})
+
+static inline int get_vcpuaffinity(unsigned int nr, void *mask)
+{
+ union xen_domctl domctl;
+ int rc;
+
+ BUILD_BUG_ON(XEN_DOMCTL_INTERFACE_VERSION > 6);
+ rc = vcpuaffinity(get, 6);
+#if CONFIG_XEN_COMPAT < 0x040000
+ if (rc)
+ rc = vcpuaffinity(get, 5);
+#endif
+#if CONFIG_XEN_COMPAT < 0x030100
+ if (rc)
+ rc = vcpuaffinity(get, 4);
+#endif
+ return rc;
+}
+
+static inline int set_vcpuaffinity(unsigned int nr, void *mask)
+{
+ union xen_domctl domctl;
+ int rc;
+
+ BUILD_BUG_ON(XEN_DOMCTL_INTERFACE_VERSION > 6);
+ rc = vcpuaffinity(set, 6);
+#if CONFIG_XEN_COMPAT < 0x040000
+ if (rc)
+ rc = vcpuaffinity(set, 5);
+#endif
+#if CONFIG_XEN_COMPAT < 0x030100
+ if (rc)
+ rc = vcpuaffinity(set, 4);
+#endif
+ return rc;
+}
+
+static DEFINE_PER_CPU(void *, saved_pcpu_affinity);
+
+#define BITS_PER_PAGE (PAGE_SIZE * BITS_PER_LONG / sizeof(long))
+
+int xen_set_physical_cpu_affinity(int pcpu)
+{
+ int rc;
+
+ if (!is_initial_xendomain())
+ return -EPERM;
+
+ if (pcpu >= 0) {
+ void *oldmap;
+
+ if (pcpu > BITS_PER_PAGE)
+ return -ERANGE;
+
+ if (percpu_read(saved_pcpu_affinity))
+ return -EBUSY;
+
+ oldmap = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!oldmap)
+ return -ENOMEM;
+
+ rc = get_vcpuaffinity(BITS_PER_PAGE, oldmap);
+ if (!rc) {
+ void *newmap = kzalloc(BITS_TO_LONGS(pcpu + 1)
+ * sizeof(long), GFP_KERNEL);
+
+ if (newmap) {
+ __set_bit(pcpu, newmap);
+ rc = set_vcpuaffinity(pcpu + 1, newmap);
+ kfree(newmap);
+ } else
+ rc = -ENOMEM;
+ }
+
+ if (!rc)
+ percpu_write(saved_pcpu_affinity, oldmap);
+ else
+ free_page((unsigned long)oldmap);
+ } else {
+ if (!percpu_read(saved_pcpu_affinity))
+ return 0;
+ rc = set_vcpuaffinity(BITS_PER_PAGE,
+ percpu_read(saved_pcpu_affinity));
+ free_page((unsigned long)percpu_read(saved_pcpu_affinity));
+ percpu_write(saved_pcpu_affinity, NULL);
+ }
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(xen_set_physical_cpu_affinity);
+
+#endif /* CONFIG_X86 */
+
MODULE_LICENSE("GPL");
--- head-2010-01-04.orig/drivers/xen/core/domctl.h 2008-09-15 15:10:39.000000000 +0200
+++ head-2010-01-04/drivers/xen/core/domctl.h 2009-10-21 13:24:42.000000000 +0200
@@ -1,2 +1,3 @@
int xen_guest_address_size(int domid);
int xen_guest_blkif_protocol(int domid);
+int xen_set_physical_cpu_affinity(int pcpu);

28
xen-floppy Normal file
View file

@ -0,0 +1,28 @@
From: jbeulich@novell.com
Subject: Xen: improve floppy behavior
Patch-mainline: n/a
References: bnc#584216
Timing is significantly different from native both because Xen traps
I/O port accesses and using DMA not being possible (without intrusive
changes). Due to the overhead of trapped port accesses, I/O is already
slow enough (and Xen doesn't run on very old hardware anyway), so the
situation can easily be improved by not enforcing REALLY_SLOW_IO.
This doesn't completely address the issue - Xen just cannot guarantee
scheduling of a particular vCPU with a maximum latency of about 80us
(needed for the default FIFO threshold value of 10). The only complete
solution would require making ISA DMA usable on Xen.
--- sle11sp1-2010-03-01.orig/drivers/block/floppy.c 2009-12-03 04:51:21.000000000 +0100
+++ sle11sp1-2010-03-01/drivers/block/floppy.c 2010-03-05 09:16:48.000000000 +0100
@@ -147,7 +147,9 @@
#define FLOPPY_SANITY_CHECK
#undef FLOPPY_SILENT_DCL_CLEAR
+#ifndef CONFIG_XEN
#define REALLY_SLOW_IO
+#endif
#define DEBUGT 2
#define DCL_DEBUG /* debug disk change line */

791
xen-ipi-per-cpu-irq Normal file
View file

@ -0,0 +1,791 @@
From: jbeulich@novell.com
Subject: fold IPIs onto a single IRQ each
Patch-mainline: obsolete
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/apic/ipi-xen.c 2009-11-06 10:52:02.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/kernel/apic/ipi-xen.c 2009-11-06 11:10:20.000000000 +0100
@@ -21,31 +21,22 @@
#include <xen/evtchn.h>
-DECLARE_PER_CPU(int, ipi_to_irq[NR_IPIS]);
-
-static inline void __send_IPI_one(unsigned int cpu, int vector)
-{
- int irq = per_cpu(ipi_to_irq, cpu)[vector];
- BUG_ON(irq < 0);
- notify_remote_via_irq(irq);
-}
-
static void __send_IPI_shortcut(unsigned int shortcut, int vector)
{
unsigned int cpu;
switch (shortcut) {
case APIC_DEST_SELF:
- __send_IPI_one(smp_processor_id(), vector);
+ notify_remote_via_ipi(vector, smp_processor_id());
break;
case APIC_DEST_ALLBUT:
for_each_online_cpu(cpu)
if (cpu != smp_processor_id())
- __send_IPI_one(cpu, vector);
+ notify_remote_via_ipi(vector, cpu);
break;
case APIC_DEST_ALLINC:
for_each_online_cpu(cpu)
- __send_IPI_one(cpu, vector);
+ notify_remote_via_ipi(vector, cpu);
break;
default:
printk("XXXXXX __send_IPI_shortcut %08x vector %d\n", shortcut,
@@ -63,7 +54,7 @@ void xen_send_IPI_mask_allbutself(const
WARN_ON(!cpumask_subset(cpumask, cpu_online_mask));
for_each_cpu_and(cpu, cpumask, cpu_online_mask)
if (cpu != smp_processor_id())
- __send_IPI_one(cpu, vector);
+ notify_remote_via_ipi(vector, cpu);
local_irq_restore(flags);
}
@@ -75,7 +66,7 @@ void xen_send_IPI_mask(const struct cpum
local_irq_save(flags);
WARN_ON(!cpumask_subset(cpumask, cpu_online_mask));
for_each_cpu_and(cpu, cpumask, cpu_online_mask)
- __send_IPI_one(cpu, vector);
+ notify_remote_via_ipi(vector, cpu);
local_irq_restore(flags);
}
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/irq-xen.c 2010-01-07 11:22:00.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/kernel/irq-xen.c 2010-01-07 11:22:50.000000000 +0100
@@ -312,6 +312,7 @@ void fixup_irqs(void)
affinity = desc->affinity;
if (!irq_has_action(irq) ||
+ (desc->status & IRQ_PER_CPU) ||
cpumask_equal(affinity, cpu_online_mask)) {
spin_unlock(&desc->lock);
continue;
--- sle11sp1-2010-03-29.orig/drivers/xen/Kconfig 2010-03-29 09:12:59.000000000 +0200
+++ sle11sp1-2010-03-29/drivers/xen/Kconfig 2010-03-29 09:13:07.000000000 +0200
@@ -4,6 +4,7 @@
config XEN
bool
+ select IRQ_PER_CPU if SMP
if XEN
config XEN_INTERFACE_VERSION
@@ -350,6 +351,9 @@ endmenu
config HAVE_IRQ_IGNORE_UNHANDLED
def_bool y
+config IRQ_PER_CPU
+ bool
+
config NO_IDLE_HZ
def_bool y
--- sle11sp1-2010-03-29.orig/drivers/xen/core/evtchn.c 2010-02-09 17:18:45.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/evtchn.c 2010-02-09 17:18:51.000000000 +0100
@@ -58,6 +58,22 @@ static DEFINE_SPINLOCK(irq_mapping_updat
static int evtchn_to_irq[NR_EVENT_CHANNELS] = {
[0 ... NR_EVENT_CHANNELS-1] = -1 };
+/* IRQ <-> IPI mapping. */
+#ifndef NR_IPIS
+#define NR_IPIS 1
+#endif
+#if defined(CONFIG_SMP) && defined(CONFIG_X86)
+static int ipi_to_irq[NR_IPIS] __read_mostly = {[0 ... NR_IPIS-1] = -1};
+static DEFINE_PER_CPU(int[NR_IPIS], ipi_to_evtchn);
+#else
+#define PER_CPU_IPI_IRQ
+#endif
+#if !defined(CONFIG_SMP) || !defined(PER_CPU_IPI_IRQ)
+#define BUG_IF_IPI(irq) BUG_ON(type_from_irq(irq) == IRQT_IPI)
+#else
+#define BUG_IF_IPI(irq) ((void)(irq))
+#endif
+
/* Binding types. */
enum {
IRQT_UNBOUND,
@@ -116,12 +132,14 @@ static inline u32 mk_irq_info(u32 type,
* Accessors for packed IRQ information.
*/
+#ifdef PER_CPU_IPI_IRQ
static inline unsigned int evtchn_from_irq(int irq)
{
const struct irq_cfg *cfg = irq_cfg(irq);
return cfg ? cfg->info & ((1U << _EVTCHN_BITS) - 1) : 0;
}
+#endif
static inline unsigned int index_from_irq(int irq)
{
@@ -138,14 +156,32 @@ static inline unsigned int type_from_irq
return cfg ? cfg->info >> (32 - _IRQT_BITS) : IRQT_UNBOUND;
}
+#ifndef PER_CPU_IPI_IRQ
+static inline unsigned int evtchn_from_per_cpu_irq(unsigned int irq,
+ unsigned int cpu)
+{
+ BUG_ON(type_from_irq(irq) != IRQT_IPI);
+ return per_cpu(ipi_to_evtchn, cpu)[index_from_irq(irq)];
+}
+
+static inline unsigned int evtchn_from_irq(unsigned int irq)
+{
+ if (type_from_irq(irq) != IRQT_IPI) {
+ const struct irq_cfg *cfg = irq_cfg(irq);
+
+ return cfg ? cfg->info & ((1U << _EVTCHN_BITS) - 1) : 0;
+ }
+ return evtchn_from_per_cpu_irq(irq, smp_processor_id());
+}
+#endif
+
/* IRQ <-> VIRQ mapping. */
DEFINE_PER_CPU(int[NR_VIRQS], virq_to_irq) = {[0 ... NR_VIRQS-1] = -1};
+#if defined(CONFIG_SMP) && defined(PER_CPU_IPI_IRQ)
/* IRQ <-> IPI mapping. */
-#ifndef NR_IPIS
-#define NR_IPIS 1
-#endif
DEFINE_PER_CPU(int[NR_IPIS], ipi_to_irq) = {[0 ... NR_IPIS-1] = -1};
+#endif
#ifdef CONFIG_SMP
@@ -169,8 +205,14 @@ static void bind_evtchn_to_cpu(unsigned
BUG_ON(!test_bit(chn, s->evtchn_mask));
- if (irq != -1)
- cpumask_copy(irq_to_desc(irq)->affinity, cpumask_of(cpu));
+ if (irq != -1) {
+ struct irq_desc *desc = irq_to_desc(irq);
+
+ if (!(desc->status & IRQ_PER_CPU))
+ cpumask_copy(desc->affinity, cpumask_of(cpu));
+ else
+ cpumask_set_cpu(cpu, desc->affinity);
+ }
clear_bit(chn, per_cpu(cpu_evtchn_mask, cpu_evtchn[chn]));
set_bit(chn, per_cpu(cpu_evtchn_mask, cpu));
@@ -344,7 +386,7 @@ asmlinkage void __irq_entry evtchn_do_up
static struct irq_chip dynirq_chip;
-static int find_unbound_irq(unsigned int cpu)
+static int find_unbound_irq(unsigned int cpu, bool percpu)
{
static int warned;
int irq;
@@ -354,10 +396,19 @@ static int find_unbound_irq(unsigned int
struct irq_cfg *cfg = desc->chip_data;
if (!cfg->bindcount) {
+ irq_flow_handler_t handle;
+ const char *name;
+
desc->status |= IRQ_NOPROBE;
+ if (!percpu) {
+ handle = handle_level_irq;
+ name = "level";
+ } else {
+ handle = handle_percpu_irq;
+ name = "percpu";
+ }
set_irq_chip_and_handler_name(irq, &dynirq_chip,
- handle_level_irq,
- "level");
+ handle, name);
return irq;
}
}
@@ -378,7 +429,7 @@ static int bind_caller_port_to_irq(unsig
spin_lock(&irq_mapping_update_lock);
if ((irq = evtchn_to_irq[caller_port]) == -1) {
- if ((irq = find_unbound_irq(smp_processor_id())) < 0)
+ if ((irq = find_unbound_irq(smp_processor_id(), false)) < 0)
goto out;
evtchn_to_irq[caller_port] = irq;
@@ -401,7 +452,7 @@ static int bind_local_port_to_irq(unsign
BUG_ON(evtchn_to_irq[local_port] != -1);
- if ((irq = find_unbound_irq(smp_processor_id())) < 0) {
+ if ((irq = find_unbound_irq(smp_processor_id(), false)) < 0) {
struct evtchn_close close = { .port = local_port };
if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close))
BUG();
@@ -454,7 +505,7 @@ static int bind_virq_to_irq(unsigned int
spin_lock(&irq_mapping_update_lock);
if ((irq = per_cpu(virq_to_irq, cpu)[virq]) == -1) {
- if ((irq = find_unbound_irq(cpu)) < 0)
+ if ((irq = find_unbound_irq(cpu, false)) < 0)
goto out;
bind_virq.virq = virq;
@@ -479,6 +530,7 @@ static int bind_virq_to_irq(unsigned int
return irq;
}
+#if defined(CONFIG_SMP) && defined(PER_CPU_IPI_IRQ)
static int bind_ipi_to_irq(unsigned int ipi, unsigned int cpu)
{
struct evtchn_bind_ipi bind_ipi;
@@ -487,7 +539,7 @@ static int bind_ipi_to_irq(unsigned int
spin_lock(&irq_mapping_update_lock);
if ((irq = per_cpu(ipi_to_irq, cpu)[ipi]) == -1) {
- if ((irq = find_unbound_irq(cpu)) < 0)
+ if ((irq = find_unbound_irq(cpu, false)) < 0)
goto out;
bind_ipi.vcpu = cpu;
@@ -510,6 +562,7 @@ static int bind_ipi_to_irq(unsigned int
spin_unlock(&irq_mapping_update_lock);
return irq;
}
+#endif
static void unbind_from_irq(unsigned int irq)
{
@@ -517,6 +570,7 @@ static void unbind_from_irq(unsigned int
unsigned int cpu;
int evtchn = evtchn_from_irq(irq);
+ BUG_IF_IPI(irq);
spin_lock(&irq_mapping_update_lock);
if (!--irq_cfg(irq)->bindcount && VALID_EVTCHN(evtchn)) {
@@ -530,10 +584,12 @@ static void unbind_from_irq(unsigned int
per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
[index_from_irq(irq)] = -1;
break;
+#if defined(CONFIG_SMP) && defined(PER_CPU_IPI_IRQ)
case IRQT_IPI:
per_cpu(ipi_to_irq, cpu_from_evtchn(evtchn))
[index_from_irq(irq)] = -1;
break;
+#endif
default:
break;
}
@@ -556,6 +612,46 @@ static void unbind_from_irq(unsigned int
spin_unlock(&irq_mapping_update_lock);
}
+#if defined(CONFIG_SMP) && !defined(PER_CPU_IPI_IRQ)
+void unbind_from_per_cpu_irq(unsigned int irq, unsigned int cpu)
+{
+ struct evtchn_close close;
+ int evtchn = evtchn_from_per_cpu_irq(irq, cpu);
+
+ spin_lock(&irq_mapping_update_lock);
+
+ if (VALID_EVTCHN(evtchn)) {
+ struct irq_desc *desc = irq_to_desc(irq);
+
+ mask_evtchn(evtchn);
+
+ BUG_ON(irq_cfg(irq)->bindcount <= 1);
+ irq_cfg(irq)->bindcount--;
+ cpumask_clear_cpu(cpu, desc->affinity);
+
+ close.port = evtchn;
+ if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close))
+ BUG();
+
+ switch (type_from_irq(irq)) {
+ case IRQT_IPI:
+ per_cpu(ipi_to_evtchn, cpu)[index_from_irq(irq)] = 0;
+ break;
+ default:
+ BUG();
+ break;
+ }
+
+ /* Closed ports are implicitly re-bound to VCPU0. */
+ bind_evtchn_to_cpu(evtchn, 0);
+
+ evtchn_to_irq[evtchn] = -1;
+ }
+
+ spin_unlock(&irq_mapping_update_lock);
+}
+#endif /* CONFIG_SMP && !PER_CPU_IPI_IRQ */
+
int bind_caller_port_to_irqhandler(
unsigned int caller_port,
irq_handler_t handler,
@@ -650,6 +746,8 @@ int bind_virq_to_irqhandler(
}
EXPORT_SYMBOL_GPL(bind_virq_to_irqhandler);
+#ifdef CONFIG_SMP
+#ifdef PER_CPU_IPI_IRQ
int bind_ipi_to_irqhandler(
unsigned int ipi,
unsigned int cpu,
@@ -673,7 +771,71 @@ int bind_ipi_to_irqhandler(
return irq;
}
-EXPORT_SYMBOL_GPL(bind_ipi_to_irqhandler);
+#else
+int __cpuinit bind_ipi_to_irqaction(
+ unsigned int ipi,
+ unsigned int cpu,
+ struct irqaction *action)
+{
+ struct evtchn_bind_ipi bind_ipi;
+ int evtchn, irq, retval = 0;
+
+ spin_lock(&irq_mapping_update_lock);
+
+ if (VALID_EVTCHN(per_cpu(ipi_to_evtchn, cpu)[ipi])) {
+ spin_unlock(&irq_mapping_update_lock);
+ return -EBUSY;
+ }
+
+ if ((irq = ipi_to_irq[ipi]) == -1) {
+ if ((irq = find_unbound_irq(cpu, true)) < 0) {
+ spin_unlock(&irq_mapping_update_lock);
+ return irq;
+ }
+
+ /* Extra reference so count will never drop to zero. */
+ irq_cfg(irq)->bindcount++;
+
+ ipi_to_irq[ipi] = irq;
+ irq_cfg(irq)->info = mk_irq_info(IRQT_IPI, ipi, 0);
+ retval = 1;
+ }
+
+ bind_ipi.vcpu = cpu;
+ if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_ipi,
+ &bind_ipi) != 0)
+ BUG();
+
+ evtchn = bind_ipi.port;
+ evtchn_to_irq[evtchn] = irq;
+ per_cpu(ipi_to_evtchn, cpu)[ipi] = evtchn;
+
+ bind_evtchn_to_cpu(evtchn, cpu);
+
+ irq_cfg(irq)->bindcount++;
+
+ spin_unlock(&irq_mapping_update_lock);
+
+ if (retval == 0) {
+ unsigned long flags;
+
+ local_irq_save(flags);
+ unmask_evtchn(evtchn);
+ local_irq_restore(flags);
+ } else {
+ action->flags |= IRQF_PERCPU | IRQF_NO_SUSPEND;
+ retval = setup_irq(irq, action);
+ if (retval) {
+ unbind_from_per_cpu_irq(irq, cpu);
+ BUG_ON(retval > 0);
+ irq = retval;
+ }
+ }
+
+ return irq;
+}
+#endif /* PER_CPU_IPI_IRQ */
+#endif /* CONFIG_SMP */
void unbind_from_irqhandler(unsigned int irq, void *dev_id)
{
@@ -699,6 +861,7 @@ static void rebind_irq_to_cpu(unsigned i
{
int evtchn = evtchn_from_irq(irq);
+ BUG_IF_IPI(irq);
if (VALID_EVTCHN(evtchn))
rebind_evtchn_to_cpu(evtchn, tcpu);
}
@@ -784,6 +947,7 @@ static struct irq_chip dynirq_chip = {
.unmask = unmask_dynirq,
.mask_ack = ack_dynirq,
.ack = ack_dynirq,
+ .eoi = end_dynirq,
.end = end_dynirq,
#ifdef CONFIG_SMP
.set_affinity = set_affinity_irq,
@@ -963,10 +1127,21 @@ int irq_ignore_unhandled(unsigned int ir
return !!(irq_status.flags & XENIRQSTAT_shared);
}
+#if defined(CONFIG_SMP) && !defined(PER_CPU_IPI_IRQ)
+void notify_remote_via_ipi(unsigned int ipi, unsigned int cpu)
+{
+ int evtchn = evtchn_from_per_cpu_irq(ipi_to_irq[ipi], cpu);
+
+ if (VALID_EVTCHN(evtchn))
+ notify_remote_via_evtchn(evtchn);
+}
+#endif
+
void notify_remote_via_irq(int irq)
{
int evtchn = evtchn_from_irq(irq);
+ BUG_IF_IPI(irq);
if (VALID_EVTCHN(evtchn))
notify_remote_via_evtchn(evtchn);
}
@@ -974,6 +1149,7 @@ EXPORT_SYMBOL_GPL(notify_remote_via_irq)
int irq_to_evtchn_port(int irq)
{
+ BUG_IF_IPI(irq);
return evtchn_from_irq(irq);
}
EXPORT_SYMBOL_GPL(irq_to_evtchn_port);
@@ -1089,11 +1265,17 @@ static void restore_cpu_virqs(unsigned i
static void restore_cpu_ipis(unsigned int cpu)
{
+#ifdef CONFIG_SMP
struct evtchn_bind_ipi bind_ipi;
int ipi, irq, evtchn;
for (ipi = 0; ipi < NR_IPIS; ipi++) {
+#ifdef PER_CPU_IPI_IRQ
if ((irq = per_cpu(ipi_to_irq, cpu)[ipi]) == -1)
+#else
+ if ((irq = ipi_to_irq[ipi]) == -1
+ || !VALID_EVTCHN(per_cpu(ipi_to_evtchn, cpu)[ipi]))
+#endif
continue;
BUG_ON(irq_cfg(irq)->info != mk_irq_info(IRQT_IPI, ipi, 0));
@@ -1107,13 +1289,18 @@ static void restore_cpu_ipis(unsigned in
/* Record the new mapping. */
evtchn_to_irq[evtchn] = irq;
+#ifdef PER_CPU_IPI_IRQ
irq_cfg(irq)->info = mk_irq_info(IRQT_IPI, ipi, evtchn);
+#else
+ per_cpu(ipi_to_evtchn, cpu)[ipi] = evtchn;
+#endif
bind_evtchn_to_cpu(evtchn, cpu);
/* Ready for use. */
if (!(irq_to_desc(irq)->status & IRQ_DISABLED))
unmask_evtchn(evtchn);
}
+#endif
}
static int evtchn_resume(struct sys_device *dev)
--- sle11sp1-2010-03-29.orig/drivers/xen/core/smpboot.c 2010-03-22 12:57:24.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/smpboot.c 2010-03-22 12:57:46.000000000 +0100
@@ -40,14 +40,10 @@ cpumask_var_t vcpu_initialized_mask;
DEFINE_PER_CPU(struct cpuinfo_x86, cpu_info);
EXPORT_PER_CPU_SYMBOL(cpu_info);
-static DEFINE_PER_CPU(int, resched_irq);
-static DEFINE_PER_CPU(int, callfunc_irq);
-static DEFINE_PER_CPU(int, call1func_irq);
-static DEFINE_PER_CPU(int, reboot_irq);
-static char resched_name[NR_CPUS][15];
-static char callfunc_name[NR_CPUS][15];
-static char call1func_name[NR_CPUS][15];
-static char reboot_name[NR_CPUS][15];
+static int __read_mostly resched_irq = -1;
+static int __read_mostly callfunc_irq = -1;
+static int __read_mostly call1func_irq = -1;
+static int __read_mostly reboot_irq = -1;
#ifdef CONFIG_X86_LOCAL_APIC
#define set_cpu_to_apicid(cpu, apicid) (per_cpu(x86_cpu_to_apicid, cpu) = (apicid))
@@ -109,58 +105,68 @@ remove_siblinginfo(unsigned int cpu)
static int __cpuinit xen_smp_intr_init(unsigned int cpu)
{
+ static struct irqaction resched_action = {
+ .handler = smp_reschedule_interrupt,
+ .flags = IRQF_DISABLED,
+ .name = "resched"
+ }, callfunc_action = {
+ .handler = smp_call_function_interrupt,
+ .flags = IRQF_DISABLED,
+ .name = "callfunc"
+ }, call1func_action = {
+ .handler = smp_call_function_single_interrupt,
+ .flags = IRQF_DISABLED,
+ .name = "call1func"
+ }, reboot_action = {
+ .handler = smp_reboot_interrupt,
+ .flags = IRQF_DISABLED,
+ .name = "reboot"
+ };
int rc;
- per_cpu(resched_irq, cpu) = per_cpu(callfunc_irq, cpu) =
- per_cpu(call1func_irq, cpu) = per_cpu(reboot_irq, cpu) = -1;
-
- sprintf(resched_name[cpu], "resched%u", cpu);
- rc = bind_ipi_to_irqhandler(RESCHEDULE_VECTOR,
- cpu,
- smp_reschedule_interrupt,
- IRQF_DISABLED|IRQF_NOBALANCING,
- resched_name[cpu],
- NULL);
+ rc = bind_ipi_to_irqaction(RESCHEDULE_VECTOR,
+ cpu,
+ &resched_action);
if (rc < 0)
- goto fail;
- per_cpu(resched_irq, cpu) = rc;
-
- sprintf(callfunc_name[cpu], "callfunc%u", cpu);
- rc = bind_ipi_to_irqhandler(CALL_FUNCTION_VECTOR,
- cpu,
- smp_call_function_interrupt,
- IRQF_DISABLED|IRQF_NOBALANCING,
- callfunc_name[cpu],
- NULL);
+ return rc;
+ if (resched_irq < 0)
+ resched_irq = rc;
+ else
+ BUG_ON(resched_irq != rc);
+
+ rc = bind_ipi_to_irqaction(CALL_FUNCTION_VECTOR,
+ cpu,
+ &callfunc_action);
if (rc < 0)
- goto fail;
- per_cpu(callfunc_irq, cpu) = rc;
-
- sprintf(call1func_name[cpu], "call1func%u", cpu);
- rc = bind_ipi_to_irqhandler(CALL_FUNC_SINGLE_VECTOR,
- cpu,
- smp_call_function_single_interrupt,
- IRQF_DISABLED|IRQF_NOBALANCING,
- call1func_name[cpu],
- NULL);
+ goto unbind_resched;
+ if (callfunc_irq < 0)
+ callfunc_irq = rc;
+ else
+ BUG_ON(callfunc_irq != rc);
+
+ rc = bind_ipi_to_irqaction(CALL_FUNC_SINGLE_VECTOR,
+ cpu,
+ &call1func_action);
if (rc < 0)
- goto fail;
- per_cpu(call1func_irq, cpu) = rc;
-
- sprintf(reboot_name[cpu], "reboot%u", cpu);
- rc = bind_ipi_to_irqhandler(REBOOT_VECTOR,
- cpu,
- smp_reboot_interrupt,
- IRQF_DISABLED|IRQF_NOBALANCING,
- reboot_name[cpu],
- NULL);
+ goto unbind_call;
+ if (call1func_irq < 0)
+ call1func_irq = rc;
+ else
+ BUG_ON(call1func_irq != rc);
+
+ rc = bind_ipi_to_irqaction(REBOOT_VECTOR,
+ cpu,
+ &reboot_action);
if (rc < 0)
- goto fail;
- per_cpu(reboot_irq, cpu) = rc;
+ goto unbind_call1;
+ if (reboot_irq < 0)
+ reboot_irq = rc;
+ else
+ BUG_ON(reboot_irq != rc);
rc = xen_spinlock_init(cpu);
if (rc < 0)
- goto fail;
+ goto unbind_reboot;
if ((cpu != 0) && ((rc = local_setup_timer(cpu)) != 0))
goto fail;
@@ -168,15 +174,15 @@ static int __cpuinit xen_smp_intr_init(u
return 0;
fail:
- if (per_cpu(resched_irq, cpu) >= 0)
- unbind_from_irqhandler(per_cpu(resched_irq, cpu), NULL);
- if (per_cpu(callfunc_irq, cpu) >= 0)
- unbind_from_irqhandler(per_cpu(callfunc_irq, cpu), NULL);
- if (per_cpu(call1func_irq, cpu) >= 0)
- unbind_from_irqhandler(per_cpu(call1func_irq, cpu), NULL);
- if (per_cpu(reboot_irq, cpu) >= 0)
- unbind_from_irqhandler(per_cpu(reboot_irq, cpu), NULL);
xen_spinlock_cleanup(cpu);
+ unbind_reboot:
+ unbind_from_per_cpu_irq(reboot_irq, cpu);
+ unbind_call1:
+ unbind_from_per_cpu_irq(call1func_irq, cpu);
+ unbind_call:
+ unbind_from_per_cpu_irq(callfunc_irq, cpu);
+ unbind_resched:
+ unbind_from_per_cpu_irq(resched_irq, cpu);
return rc;
}
@@ -186,10 +192,10 @@ static void __cpuinit xen_smp_intr_exit(
if (cpu != 0)
local_teardown_timer(cpu);
- unbind_from_irqhandler(per_cpu(resched_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(callfunc_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(call1func_irq, cpu), NULL);
- unbind_from_irqhandler(per_cpu(reboot_irq, cpu), NULL);
+ unbind_from_per_cpu_irq(resched_irq, cpu);
+ unbind_from_per_cpu_irq(callfunc_irq, cpu);
+ unbind_from_per_cpu_irq(call1func_irq, cpu);
+ unbind_from_per_cpu_irq(reboot_irq, cpu);
xen_spinlock_cleanup(cpu);
}
#endif
--- sle11sp1-2010-03-29.orig/drivers/xen/core/spinlock.c 2010-02-24 16:14:47.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/spinlock.c 2010-02-23 14:25:31.000000000 +0100
@@ -14,8 +14,7 @@
#ifdef TICKET_SHIFT
-static DEFINE_PER_CPU(int, spinlock_irq) = -1;
-static char spinlock_name[NR_CPUS][15];
+static int __read_mostly spinlock_irq = -1;
struct spinning {
raw_spinlock_t *lock;
@@ -32,29 +31,31 @@ static DEFINE_PER_CPU(raw_rwlock_t, spin
int __cpuinit xen_spinlock_init(unsigned int cpu)
{
+ static struct irqaction spinlock_action = {
+ .handler = smp_reschedule_interrupt,
+ .flags = IRQF_DISABLED,
+ .name = "spinlock"
+ };
int rc;
- sprintf(spinlock_name[cpu], "spinlock%u", cpu);
- rc = bind_ipi_to_irqhandler(SPIN_UNLOCK_VECTOR,
- cpu,
- smp_reschedule_interrupt,
- IRQF_DISABLED|IRQF_NOBALANCING,
- spinlock_name[cpu],
- NULL);
+ rc = bind_ipi_to_irqaction(SPIN_UNLOCK_VECTOR,
+ cpu,
+ &spinlock_action);
if (rc < 0)
return rc;
- disable_irq(rc); /* make sure it's never delivered */
- per_cpu(spinlock_irq, cpu) = rc;
+ if (spinlock_irq < 0) {
+ disable_irq(rc); /* make sure it's never delivered */
+ spinlock_irq = rc;
+ } else
+ BUG_ON(spinlock_irq != rc);
return 0;
}
void __cpuinit xen_spinlock_cleanup(unsigned int cpu)
{
- if (per_cpu(spinlock_irq, cpu) >= 0)
- unbind_from_irqhandler(per_cpu(spinlock_irq, cpu), NULL);
- per_cpu(spinlock_irq, cpu) = -1;
+ unbind_from_per_cpu_irq(spinlock_irq, cpu);
}
static unsigned int spin_adjust(struct spinning *spinning,
@@ -84,7 +85,7 @@ unsigned int xen_spin_adjust(const raw_s
bool xen_spin_wait(raw_spinlock_t *lock, unsigned int *ptok,
unsigned int flags)
{
- int irq = percpu_read(spinlock_irq);
+ int irq = spinlock_irq;
bool rc;
typeof(vcpu_info(0)->evtchn_upcall_mask) upcall_mask;
raw_rwlock_t *rm_lock;
@@ -240,7 +241,7 @@ void xen_spin_kick(raw_spinlock_t *lock,
raw_local_irq_restore(flags);
if (unlikely(spinning)) {
- notify_remote_via_irq(per_cpu(spinlock_irq, cpu));
+ notify_remote_via_ipi(SPIN_UNLOCK_VECTOR, cpu);
return;
}
}
--- sle11sp1-2010-03-29.orig/include/xen/evtchn.h 2009-12-18 10:13:12.000000000 +0100
+++ sle11sp1-2010-03-29/include/xen/evtchn.h 2009-12-18 10:13:26.000000000 +0100
@@ -92,6 +92,8 @@ int bind_virq_to_irqhandler(
unsigned long irqflags,
const char *devname,
void *dev_id);
+#if defined(CONFIG_SMP) && !defined(MODULE)
+#ifndef CONFIG_X86
int bind_ipi_to_irqhandler(
unsigned int ipi,
unsigned int cpu,
@@ -99,6 +101,13 @@ int bind_ipi_to_irqhandler(
unsigned long irqflags,
const char *devname,
void *dev_id);
+#else
+int bind_ipi_to_irqaction(
+ unsigned int ipi,
+ unsigned int cpu,
+ struct irqaction *action);
+#endif
+#endif
/*
* Common unbind function for all event sources. Takes IRQ to unbind from.
@@ -107,6 +116,11 @@ int bind_ipi_to_irqhandler(
*/
void unbind_from_irqhandler(unsigned int irq, void *dev_id);
+#if defined(CONFIG_SMP) && !defined(MODULE) && defined(CONFIG_X86)
+/* Specialized unbind function for per-CPU IRQs. */
+void unbind_from_per_cpu_irq(unsigned int irq, unsigned int cpu);
+#endif
+
#ifndef CONFIG_XEN
void irq_resume(void);
#endif
@@ -184,5 +198,9 @@ void xen_poll_irq(int irq);
void notify_remote_via_irq(int irq);
int irq_to_evtchn_port(int irq);
+#if defined(CONFIG_SMP) && !defined(MODULE) && defined(CONFIG_X86)
+void notify_remote_via_ipi(unsigned int ipi, unsigned int cpu);
+#endif
+
#endif /* __ASM_EVTCHN_H__ */
#endif /* CONFIG_PARAVIRT_XEN */

32
xen-kconfig-compat Normal file
View file

@ -0,0 +1,32 @@
From: jbeulich@novell.com
Subject: add 3.2.0-compatibility configure option
Patch-mainline: obsolete
--- sle11sp1-2010-03-29.orig/drivers/xen/Kconfig 2010-03-29 09:12:44.000000000 +0200
+++ sle11sp1-2010-03-29/drivers/xen/Kconfig 2010-03-29 09:12:59.000000000 +0200
@@ -320,6 +320,15 @@ choice
config XEN_COMPAT_030100_AND_LATER
bool "3.1.0 and later"
+ config XEN_COMPAT_030200_AND_LATER
+ bool "3.2.0 and later"
+
+ config XEN_COMPAT_030300_AND_LATER
+ bool "3.3.0 and later"
+
+ config XEN_COMPAT_030400_AND_LATER
+ bool "3.4.0 and later"
+
config XEN_COMPAT_LATEST_ONLY
bool "no compatibility code"
@@ -328,6 +337,9 @@ endchoice
config XEN_COMPAT
hex
default 0xffffff if XEN_COMPAT_LATEST_ONLY
+ default 0x030400 if XEN_COMPAT_030400_AND_LATER
+ default 0x030300 if XEN_COMPAT_030300_AND_LATER
+ default 0x030200 if XEN_COMPAT_030200_AND_LATER
default 0x030100 if XEN_COMPAT_030100_AND_LATER
default 0x030004 if XEN_COMPAT_030004_AND_LATER
default 0x030002 if XEN_COMPAT_030002_AND_LATER

27
xen-modular-blktap Normal file
View file

@ -0,0 +1,27 @@
From: ccoffing@novell.com
Subject: Retain backwards-compatible module name with CONFIG_XEN_BLKDEV_TAP=m
Patch-mainline: obsolete
--- head-2009-05-29.orig/drivers/xen/blktap/Makefile 2007-06-12 13:13:44.000000000 +0200
+++ head-2009-05-29/drivers/xen/blktap/Makefile 2009-05-29 12:39:04.000000000 +0200
@@ -1,5 +1,5 @@
LINUXINCLUDE += -I../xen/include/public/io
-obj-$(CONFIG_XEN_BLKDEV_TAP) := xenblktap.o
+obj-$(CONFIG_XEN_BLKDEV_TAP) := blktap.o
-xenblktap-y := xenbus.o interface.o blktap.o
+blktap-y := xenbus.o interface.o blocktap.o
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ head-2009-05-29/drivers/xen/blktap/blocktap.c 2009-05-29 12:39:04.000000000 +0200
@@ -0,0 +1 @@
+#include "blktap.c"
--- head-2009-05-29.orig/drivers/xen/blktap2/Makefile 2009-05-29 10:25:53.000000000 +0200
+++ head-2009-05-29/drivers/xen/blktap2/Makefile 2009-05-29 12:39:04.000000000 +0200
@@ -1,3 +1,4 @@
-obj-$(CONFIG_XEN_BLKDEV_TAP2) := blktap.o
+obj-$(CONFIG_XEN_BLKDEV_TAP2) := blktap2.o
-blktap-objs := control.o ring.o wait_queue.o device.o request.o sysfs.o
+blktap2-y := control.o ring.o wait_queue.o device.o request.o
+blktap2-$(CONFIG_SYSFS) += sysfs.o

1282
xen-netback-generalize Normal file

File diff suppressed because it is too large Load diff

285
xen-netback-kernel-threads Normal file
View file

@ -0,0 +1,285 @@
From: Dongxiao Xu <dongxiao.xu@intel.com>
Subject: [PATCH 3/3] Use Kernel thread to replace the tasklet.
Patch-mainline: n/a
Kernel thread has more control over QoS, and could improve
dom0's userspace responseness.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Subject: xen: ensure locking gnttab_copy_grant_page is safe against interrupts.
Now that netback processing occurs in a thread instead of a tasklet
gnttab_copy_grant_page needs to be safe against interrupts.
The code is currently commented out in this tree but on 2.6.18 we observed a
deadlock where the netback thread called gnttab_copy_grant_page, locked
gnttab_dma_lock for writing, was interrupted and on return from interrupt the
network stack's TX tasklet ended up calling __gnttab_dma_map_page via the
hardware driver->swiotlb and tries to take gnttab_dma_lock for reading.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>#
Cc: "Xu, Dongxiao" <dongxiao.xu@intel.com>
jb: changed write_seq{,un}lock_irq() to write_seq{,un}lock_bh(), and
made the use of kernel threads optional (but default)
Acked-by: jbeulich@novell.com
--- sle11sp1-2010-03-05.orig/drivers/xen/core/gnttab.c 2009-12-15 09:28:00.000000000 +0100
+++ sle11sp1-2010-03-05/drivers/xen/core/gnttab.c 2010-02-02 15:10:01.000000000 +0100
@@ -553,14 +553,14 @@ int gnttab_copy_grant_page(grant_ref_t r
mfn = pfn_to_mfn(pfn);
new_mfn = virt_to_mfn(new_addr);
- write_seqlock(&gnttab_dma_lock);
+ write_seqlock_bh(&gnttab_dma_lock);
/* Make seq visible before checking page_mapped. */
smp_mb();
/* Has the page been DMA-mapped? */
if (unlikely(page_mapped(page))) {
- write_sequnlock(&gnttab_dma_lock);
+ write_sequnlock_bh(&gnttab_dma_lock);
put_page(new_page);
err = -EBUSY;
goto out;
@@ -577,7 +577,7 @@ int gnttab_copy_grant_page(grant_ref_t r
BUG_ON(err);
BUG_ON(unmap.status);
- write_sequnlock(&gnttab_dma_lock);
+ write_sequnlock_bh(&gnttab_dma_lock);
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
set_phys_to_machine(page_to_pfn(new_page), INVALID_P2M_ENTRY);
--- sle11sp1-2010-03-05.orig/drivers/xen/netback/common.h 2010-01-14 08:40:17.000000000 +0100
+++ sle11sp1-2010-03-05/drivers/xen/netback/common.h 2010-01-14 08:45:38.000000000 +0100
@@ -245,8 +245,16 @@ struct netbk_tx_pending_inuse {
#define MAX_MFN_ALLOC 64
struct xen_netbk {
- struct tasklet_struct net_tx_tasklet;
- struct tasklet_struct net_rx_tasklet;
+ union {
+ struct {
+ struct tasklet_struct net_tx_tasklet;
+ struct tasklet_struct net_rx_tasklet;
+ };
+ struct {
+ wait_queue_head_t netbk_action_wq;
+ struct task_struct *task;
+ };
+ };
struct sk_buff_head rx_queue;
struct sk_buff_head tx_queue;
--- sle11sp1-2010-03-05.orig/drivers/xen/netback/netback.c 2010-03-08 10:54:19.000000000 +0100
+++ sle11sp1-2010-03-05/drivers/xen/netback/netback.c 2010-03-08 10:56:45.000000000 +0100
@@ -35,6 +35,7 @@
*/
#include "common.h"
+#include <linux/kthread.h>
#include <linux/vmalloc.h>
#include <xen/balloon.h>
#include <xen/interface/memory.h>
@@ -43,6 +44,8 @@
struct xen_netbk *xen_netbk;
unsigned int netbk_nr_groups;
+static bool use_kthreads = true;
+static bool __initdata bind_threads;
#define GET_GROUP_INDEX(netif) ((netif)->group)
@@ -94,7 +97,11 @@ static int MODPARM_permute_returns = 0;
module_param_named(permute_returns, MODPARM_permute_returns, bool, S_IRUSR|S_IWUSR);
MODULE_PARM_DESC(permute_returns, "Randomly permute the order in which TX responses are sent to the frontend");
module_param_named(groups, netbk_nr_groups, uint, 0);
-MODULE_PARM_DESC(groups, "Specify the number of tasklet pairs to use");
+MODULE_PARM_DESC(groups, "Specify the number of tasklet pairs/threads to use");
+module_param_named(tasklets, use_kthreads, invbool, 0);
+MODULE_PARM_DESC(tasklets, "Use tasklets instead of kernel threads");
+module_param_named(bind, bind_threads, bool, 0);
+MODULE_PARM_DESC(bind, "Bind kernel threads to (v)CPUs");
int netbk_copy_skb_mode;
@@ -131,8 +138,12 @@ static inline void maybe_schedule_tx_act
smp_mb();
if ((nr_pending_reqs(netbk) < (MAX_PENDING_REQS/2)) &&
- !list_empty(&netbk->net_schedule_list))
- tasklet_schedule(&netbk->net_tx_tasklet);
+ !list_empty(&netbk->net_schedule_list)) {
+ if (use_kthreads)
+ wake_up(&netbk->netbk_action_wq);
+ else
+ tasklet_schedule(&netbk->net_tx_tasklet);
+ }
}
static struct sk_buff *netbk_copy_skb(struct sk_buff *skb)
@@ -293,7 +304,10 @@ int netif_be_start_xmit(struct sk_buff *
netbk = &xen_netbk[GET_GROUP_INDEX(netif)];
skb_queue_tail(&netbk->rx_queue, skb);
- tasklet_schedule(&netbk->net_rx_tasklet);
+ if (use_kthreads)
+ wake_up(&netbk->netbk_action_wq);
+ else
+ tasklet_schedule(&netbk->net_rx_tasklet);
return NETDEV_TX_OK;
@@ -749,8 +763,12 @@ static void net_rx_action(unsigned long
/* More work to do? */
if (!skb_queue_empty(&netbk->rx_queue) &&
- !timer_pending(&netbk->net_timer))
- tasklet_schedule(&netbk->net_rx_tasklet);
+ !timer_pending(&netbk->net_timer)) {
+ if (use_kthreads)
+ wake_up(&netbk->netbk_action_wq);
+ else
+ tasklet_schedule(&netbk->net_rx_tasklet);
+ }
#if 0
else
xen_network_done_notify();
@@ -759,12 +777,18 @@ static void net_rx_action(unsigned long
static void net_alarm(unsigned long group)
{
- tasklet_schedule(&xen_netbk[group].net_rx_tasklet);
+ if (use_kthreads)
+ wake_up(&xen_netbk[group].netbk_action_wq);
+ else
+ tasklet_schedule(&xen_netbk[group].net_rx_tasklet);
}
static void netbk_tx_pending_timeout(unsigned long group)
{
- tasklet_schedule(&xen_netbk[group].net_tx_tasklet);
+ if (use_kthreads)
+ wake_up(&xen_netbk[group].netbk_action_wq);
+ else
+ tasklet_schedule(&xen_netbk[group].net_tx_tasklet);
}
struct net_device_stats *netif_be_get_stats(struct net_device *dev)
@@ -1476,7 +1500,10 @@ static void net_tx_action(unsigned long
continue;
}
- netif_rx(skb);
+ if (use_kthreads)
+ netif_rx_ni(skb);
+ else
+ netif_rx(skb);
netif->dev->last_rx = jiffies;
}
@@ -1502,7 +1529,10 @@ static void netif_idx_release(struct xen
netbk->dealloc_prod++;
spin_unlock_irqrestore(&netbk->release_lock, flags);
- tasklet_schedule(&netbk->net_tx_tasklet);
+ if (use_kthreads)
+ wake_up(&netbk->netbk_action_wq);
+ else
+ tasklet_schedule(&netbk->net_tx_tasklet);
}
static void netif_page_release(struct page *page, unsigned int order)
@@ -1641,6 +1671,45 @@ static struct irqaction netif_be_dbg_act
};
#endif
+static inline int rx_work_todo(struct xen_netbk *netbk)
+{
+ return !skb_queue_empty(&netbk->rx_queue);
+}
+
+static inline int tx_work_todo(struct xen_netbk *netbk)
+{
+ if (netbk->dealloc_cons != netbk->dealloc_prod)
+ return 1;
+
+ if (nr_pending_reqs(netbk) + MAX_SKB_FRAGS < MAX_PENDING_REQS &&
+ !list_empty(&netbk->net_schedule_list))
+ return 1;
+
+ return 0;
+}
+
+static int netbk_action_thread(void *index)
+{
+ unsigned long group = (unsigned long)index;
+ struct xen_netbk *netbk = &xen_netbk[group];
+
+ while (1) {
+ wait_event_interruptible(netbk->netbk_action_wq,
+ rx_work_todo(netbk) ||
+ tx_work_todo(netbk));
+ cond_resched();
+
+ if (rx_work_todo(netbk))
+ net_rx_action(group);
+
+ if (tx_work_todo(netbk))
+ net_tx_action(group);
+ }
+
+ return 0;
+}
+
+
static int __init netback_init(void)
{
unsigned int i, group;
@@ -1666,8 +1735,26 @@ static int __init netback_init(void)
for (group = 0; group < netbk_nr_groups; group++) {
struct xen_netbk *netbk = &xen_netbk[group];
- tasklet_init(&netbk->net_tx_tasklet, net_tx_action, group);
- tasklet_init(&netbk->net_rx_tasklet, net_rx_action, group);
+ if (use_kthreads) {
+ init_waitqueue_head(&netbk->netbk_action_wq);
+ netbk->task = kthread_create(netbk_action_thread,
+ (void *)(long)group,
+ "netback/%u", group);
+
+ if (!IS_ERR(netbk->task)) {
+ if (bind_threads)
+ kthread_bind(netbk->task, group);
+ wake_up_process(netbk->task);
+ } else {
+ printk(KERN_ALERT
+ "kthread_create() fails at netback\n");
+ rc = PTR_ERR(netbk->task);
+ goto failed_init;
+ }
+ } else {
+ tasklet_init(&netbk->net_tx_tasklet, net_tx_action, group);
+ tasklet_init(&netbk->net_rx_tasklet, net_rx_action, group);
+ }
skb_queue_head_init(&netbk->rx_queue);
skb_queue_head_init(&netbk->tx_queue);
@@ -1736,8 +1823,11 @@ failed_init:
while (group-- > 0) {
struct xen_netbk *netbk = &xen_netbk[group];
- free_empty_pages_and_pagevec(netbk->mmap_pages,
- MAX_PENDING_REQS);
+ if (use_kthreads && netbk->task && !IS_ERR(netbk->task))
+ kthread_stop(netbk->task);
+ if (netbk->mmap_pages)
+ free_empty_pages_and_pagevec(netbk->mmap_pages,
+ MAX_PENDING_REQS);
del_timer(&netbk->tx_pending_timer);
del_timer(&netbk->net_timer);
}

View file

@ -0,0 +1,185 @@
From: Dongxiao Xu <dongxiao.xu@intel.com>
Subject: [PATCH 2/3] Netback: Multiple tasklets support.
Patch-mainline: n/a
Now netback uses one pair of tasklets for Tx/Rx data transaction. Netback
tasklet could only run at one CPU at a time, and it is used to serve all the
netfronts. Therefore it has become a performance bottle neck. This patch is to
use multiple tasklet pairs to replace the current single pair in dom0.
Assuming that Dom0 has CPUNR VCPUs, we define CPUNR kinds of tasklets pair
(CPUNR for Tx, and CPUNR for Rx). Each pare of tasklets serve specific group of
netfronts. Also for those global and static variables, we duplicated them for
each group in order to avoid the spinlock.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
jb: some cleanups
Acked-by: jbeulich@novell.com
--- sle11sp1-2010-03-05.orig/drivers/xen/netback/common.h 2010-01-14 08:39:00.000000000 +0100
+++ sle11sp1-2010-03-05/drivers/xen/netback/common.h 2010-01-14 08:40:17.000000000 +0100
@@ -58,6 +58,7 @@
typedef struct netif_st {
/* Unique identifier for this interface. */
domid_t domid;
+ unsigned int group;
unsigned int handle;
u8 fe_dev_addr[6];
@@ -99,6 +100,7 @@ typedef struct netif_st {
/* Miscellaneous private stuff. */
struct list_head list; /* scheduling list */
+ struct list_head group_list;
atomic_t refcnt;
struct net_device *dev;
struct net_device_stats stats;
@@ -259,12 +261,15 @@ struct xen_netbk {
struct list_head pending_inuse_head;
struct list_head net_schedule_list;
+ struct list_head group_domain_list;
spinlock_t net_schedule_list_lock;
spinlock_t release_lock;
+ spinlock_t group_domain_list_lock;
struct page **mmap_pages;
+ unsigned int group_domain_nr;
unsigned int alloc_index;
struct page_ext page_extinfo[MAX_PENDING_REQS];
@@ -294,4 +299,8 @@ struct xen_netbk {
unsigned long mfn_list[MAX_MFN_ALLOC];
};
+
+extern struct xen_netbk *xen_netbk;
+extern unsigned int netbk_nr_groups;
+
#endif /* __NETIF__BACKEND__COMMON_H__ */
--- sle11sp1-2010-03-05.orig/drivers/xen/netback/interface.c 2010-01-04 13:31:46.000000000 +0100
+++ sle11sp1-2010-03-05/drivers/xen/netback/interface.c 2010-03-05 10:35:22.000000000 +0100
@@ -54,14 +54,41 @@ module_param_named(queue_length, netbk_q
static void __netif_up(netif_t *netif)
{
+ unsigned int group = 0;
+ unsigned int min_domains = xen_netbk[0].group_domain_nr;
+ unsigned int i;
+
+ /* Find the list which contains least number of domains. */
+ for (i = 1; i < netbk_nr_groups; i++) {
+ if (xen_netbk[i].group_domain_nr < min_domains) {
+ group = i;
+ min_domains = xen_netbk[i].group_domain_nr;
+ }
+ }
+
+ spin_lock(&xen_netbk[group].group_domain_list_lock);
+ list_add_tail(&netif->group_list,
+ &xen_netbk[group].group_domain_list);
+ xen_netbk[group].group_domain_nr++;
+ spin_unlock(&xen_netbk[group].group_domain_list_lock);
+ netif->group = group;
+
enable_irq(netif->irq);
netif_schedule_work(netif);
}
static void __netif_down(netif_t *netif)
{
+ struct xen_netbk *netbk = xen_netbk + netif->group;
+
disable_irq(netif->irq);
netif_deschedule_work(netif);
+
+ netif->group = UINT_MAX;
+ spin_lock(&netbk->group_domain_list_lock);
+ netbk->group_domain_nr--;
+ list_del(&netif->group_list);
+ spin_unlock(&netbk->group_domain_list_lock);
}
static int net_open(struct net_device *dev)
@@ -203,6 +230,7 @@ netif_t *netif_alloc(struct device *pare
netif = netdev_priv(dev);
memset(netif, 0, sizeof(*netif));
netif->domid = domid;
+ netif->group = UINT_MAX;
netif->handle = handle;
atomic_set(&netif->refcnt, 1);
init_waitqueue_head(&netif->waiting_to_free);
--- sle11sp1-2010-03-05.orig/drivers/xen/netback/netback.c 2010-01-29 12:51:48.000000000 +0100
+++ sle11sp1-2010-03-05/drivers/xen/netback/netback.c 2010-03-08 10:54:19.000000000 +0100
@@ -41,10 +41,10 @@
/*define NETBE_DEBUG_INTERRUPT*/
-static struct xen_netbk *xen_netbk;
-static unsigned int netbk_nr_groups = 1;
+struct xen_netbk *xen_netbk;
+unsigned int netbk_nr_groups;
-#define GET_GROUP_INDEX(netif) (0)
+#define GET_GROUP_INDEX(netif) ((netif)->group)
static void netif_idx_release(struct xen_netbk *, u16 pending_idx);
static void make_tx_response(netif_t *netif,
@@ -93,6 +93,8 @@ MODULE_PARM_DESC(copy_skb, "Copy data re
static int MODPARM_permute_returns = 0;
module_param_named(permute_returns, MODPARM_permute_returns, bool, S_IRUSR|S_IWUSR);
MODULE_PARM_DESC(permute_returns, "Randomly permute the order in which TX responses are sent to the frontend");
+module_param_named(groups, netbk_nr_groups, uint, 0);
+MODULE_PARM_DESC(groups, "Specify the number of tasklet pairs to use");
int netbk_copy_skb_mode;
@@ -1519,9 +1521,20 @@ static void netif_page_release(struct pa
irqreturn_t netif_be_int(int irq, void *dev_id)
{
netif_t *netif = dev_id;
+ unsigned int group = GET_GROUP_INDEX(netif);
+
+ if (unlikely(group >= netbk_nr_groups)) {
+ /*
+ * Short of having a way to bind the IRQ in disabled mode
+ * (IRQ_NOAUTOEN), we have to ignore the first invocation(s)
+ * (before we got assigned to a group).
+ */
+ BUG_ON(group != UINT_MAX);
+ return IRQ_HANDLED;
+ }
add_to_net_schedule_list_tail(netif);
- maybe_schedule_tx_action(GET_GROUP_INDEX(netif));
+ maybe_schedule_tx_action(group);
if (netif_schedulable(netif) && !netbk_queue_full(netif))
netif_wake_queue(netif->dev);
@@ -1637,8 +1650,11 @@ static int __init netback_init(void)
if (!is_running_on_xen())
return -ENODEV;
+ if (!netbk_nr_groups)
+ netbk_nr_groups = (num_online_cpus() + 1) / 2;
+
/* We can increase reservation by this much in net_rx_action(). */
- balloon_update_driver_allowance(NET_RX_RING_SIZE);
+ balloon_update_driver_allowance(netbk_nr_groups * NET_RX_RING_SIZE);
xen_netbk = __vmalloc(netbk_nr_groups * sizeof(*xen_netbk),
GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO, PAGE_KERNEL);
@@ -1677,9 +1693,11 @@ static int __init netback_init(void)
INIT_LIST_HEAD(&netbk->pending_inuse_head);
INIT_LIST_HEAD(&netbk->net_schedule_list);
+ INIT_LIST_HEAD(&netbk->group_domain_list);
spin_lock_init(&netbk->net_schedule_list_lock);
spin_lock_init(&netbk->release_lock);
+ spin_lock_init(&netbk->group_domain_list_lock);
for (i = 0; i < MAX_PENDING_REQS; i++) {
page = netbk->mmap_pages[i];

85
xen-netback-notify-multi Normal file
View file

@ -0,0 +1,85 @@
From: jbeulich@novell.com
Subject: netback: use multicall for send multiple notifications
Patch-mainline: obsolete
This also does a small fairness improvement since now notifications
get sent in the order requests came in rather than in the inverse one.
--- sle11sp1-2010-02-09.orig/drivers/xen/core/evtchn.c 2010-02-09 17:18:55.000000000 +0100
+++ sle11sp1-2010-02-09/drivers/xen/core/evtchn.c 2010-02-09 17:19:07.000000000 +0100
@@ -1335,6 +1335,21 @@ void notify_remote_via_irq(int irq)
}
EXPORT_SYMBOL_GPL(notify_remote_via_irq);
+int multi_notify_remote_via_irq(multicall_entry_t *mcl, int irq)
+{
+ int evtchn = evtchn_from_irq(irq);
+
+ BUG_ON(type_from_irq(irq) == IRQT_VIRQ);
+ BUG_IF_IPI(irq);
+
+ if (!VALID_EVTCHN(evtchn))
+ return -EINVAL;
+
+ multi_notify_remote_via_evtchn(mcl, evtchn);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(multi_notify_remote_via_irq);
+
int irq_to_evtchn_port(int irq)
{
BUG_IF_VIRQ_PER_CPU(irq);
--- sle11sp1-2010-02-09.orig/drivers/xen/netback/netback.c 2010-01-04 13:31:44.000000000 +0100
+++ sle11sp1-2010-02-09/drivers/xen/netback/netback.c 2010-01-04 13:31:57.000000000 +0100
@@ -778,10 +778,20 @@ static void net_rx_action(unsigned long
npo.meta_cons += nr_frags + 1;
}
- while (notify_nr != 0) {
- irq = notify_list[--notify_nr];
+ if (notify_nr == 1) {
+ irq = *notify_list;
__clear_bit(irq, rx_notify);
notify_remote_via_irq(irq + DYNIRQ_BASE);
+ } else {
+ for (count = ret = 0; ret < notify_nr; ++ret) {
+ irq = notify_list[ret];
+ __clear_bit(irq, rx_notify);
+ if (!multi_notify_remote_via_irq(rx_mcl + count,
+ irq + DYNIRQ_BASE))
+ ++count;
+ }
+ if (HYPERVISOR_multicall(rx_mcl, count))
+ BUG();
}
/* More work to do? */
--- sle11sp1-2010-02-09.orig/include/xen/evtchn.h 2009-12-18 10:13:32.000000000 +0100
+++ sle11sp1-2010-02-09/include/xen/evtchn.h 2009-12-18 10:13:40.000000000 +0100
@@ -193,6 +193,18 @@ static inline void notify_remote_via_evt
VOID(HYPERVISOR_event_channel_op(EVTCHNOP_send, &send));
}
+static inline void
+multi_notify_remote_via_evtchn(multicall_entry_t *mcl, int port)
+{
+ struct evtchn_send *send = (void *)(mcl->args + 2);
+
+ BUILD_BUG_ON(sizeof(*send) > sizeof(mcl->args) - 2 * sizeof(*mcl->args));
+ send->port = port;
+ mcl->op = __HYPERVISOR_event_channel_op;
+ mcl->args[0] = EVTCHNOP_send;
+ mcl->args[1] = (unsigned long)send;
+}
+
/* Clear an irq's pending state, in preparation for polling on it. */
void xen_clear_irq_pending(int irq);
@@ -211,6 +223,7 @@ void xen_poll_irq(int irq);
* by bind_*_to_irqhandler().
*/
void notify_remote_via_irq(int irq);
+int multi_notify_remote_via_irq(multicall_entry_t *, int irq);
int irq_to_evtchn_port(int irq);
#if defined(CONFIG_SMP) && !defined(MODULE) && defined(CONFIG_X86)

61
xen-netback-nr-irqs Normal file
View file

@ -0,0 +1,61 @@
From: jbeulich@novell.com
Subject: netback: reduce overhead of IRQ recording
Patch-mainline: obsolete
Since both NR_PIRQS and NR_DYNIRQS are no longer hardcoded, the
(memory) overhead of tracking which ones to send notifications to can
be pretty unbounded. Also, store the dynirq rather than the raw irq
to push up the limit where the type of notify_list needs to become
'int' rather than 'u16'.
--- head-2010-01-04.orig/drivers/xen/netback/interface.c 2010-01-04 12:42:38.000000000 +0100
+++ head-2010-01-04/drivers/xen/netback/interface.c 2010-01-04 13:31:46.000000000 +0100
@@ -339,6 +339,7 @@ int netif_map(netif_t *netif, unsigned l
netif->dev->name, netif);
if (err < 0)
goto err_hypervisor;
+ BUG_ON(err < DYNIRQ_BASE || err >= DYNIRQ_BASE + NR_DYNIRQS);
netif->irq = err;
disable_irq(netif->irq);
--- head-2010-01-04.orig/drivers/xen/netback/netback.c 2010-01-04 13:31:38.000000000 +0100
+++ head-2010-01-04/drivers/xen/netback/netback.c 2010-01-04 13:31:44.000000000 +0100
@@ -590,8 +590,12 @@ static void net_rx_action(unsigned long
static mmu_update_t rx_mmu[NET_RX_RING_SIZE];
static gnttab_transfer_t grant_trans_op[NET_RX_RING_SIZE];
static gnttab_copy_t grant_copy_op[NET_RX_RING_SIZE];
- static unsigned char rx_notify[NR_IRQS];
+ static DECLARE_BITMAP(rx_notify, NR_DYNIRQS);
+#if NR_DYNIRQS <= 0x10000
static u16 notify_list[NET_RX_RING_SIZE];
+#else
+ static int notify_list[NET_RX_RING_SIZE];
+#endif
static struct netbk_rx_meta meta[NET_RX_RING_SIZE];
struct netrx_pending_operations npo = {
@@ -749,11 +753,9 @@ static void net_rx_action(unsigned long
nr_frags);
RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&netif->rx, ret);
- irq = netif->irq;
- if (ret && !rx_notify[irq]) {
- rx_notify[irq] = 1;
+ irq = netif->irq - DYNIRQ_BASE;
+ if (ret && !__test_and_set_bit(irq, rx_notify))
notify_list[notify_nr++] = irq;
- }
if (netif_queue_stopped(netif->dev) &&
netif_schedulable(netif) &&
@@ -778,8 +780,8 @@ static void net_rx_action(unsigned long
while (notify_nr != 0) {
irq = notify_list[--notify_nr];
- rx_notify[irq] = 0;
- notify_remote_via_irq(irq);
+ __clear_bit(irq, rx_notify);
+ notify_remote_via_irq(irq + DYNIRQ_BASE);
}
/* More work to do? */

31
xen-netfront-ethtool Normal file
View file

@ -0,0 +1,31 @@
From: ksrinivasan@novell.com
Subject: netfront: ethtool -i does not return info about xennet driver
Patch-mainline: n/a
References: bnc#591179
Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
--- sle11sp1-2010-03-29.orig/drivers/xen/netfront/netfront.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/netfront/netfront.c 2010-03-27 00:18:30.000000000 +0100
@@ -1766,6 +1766,13 @@ static void xennet_set_features(struct n
xennet_set_tso(dev, 1);
}
+static void netfront_get_drvinfo(struct net_device *dev,
+ struct ethtool_drvinfo *info)
+{
+ strcpy(info->driver, "netfront");
+ strcpy(info->bus_info, dev_name(dev->dev.parent));
+}
+
static int network_connect(struct net_device *dev)
{
struct netfront_info *np = netdev_priv(dev);
@@ -1874,6 +1881,7 @@ static void netif_uninit(struct net_devi
static const struct ethtool_ops network_ethtool_ops =
{
+ .get_drvinfo = netfront_get_drvinfo,
.get_tx_csum = ethtool_op_get_tx_csum,
.set_tx_csum = ethtool_op_set_tx_csum,
.get_sg = ethtool_op_get_sg,

190
xen-op-packet Normal file
View file

@ -0,0 +1,190 @@
From: plc@novell.com
Subject: add support for new operation type BLKIF_OP_PACKET
Patch-mainline: obsolete
References: fate#300964
--- sle11sp1-2010-03-22.orig/drivers/xen/blkback/blkback.c 2010-03-22 12:26:12.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkback/blkback.c 2010-03-22 12:57:07.000000000 +0100
@@ -195,13 +195,15 @@ static void fast_flush_area(pending_req_
static void print_stats(blkif_t *blkif)
{
- printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d | br %4d\n",
+ printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d | br %4d | pk %4d\n",
current->comm, blkif->st_oo_req,
- blkif->st_rd_req, blkif->st_wr_req, blkif->st_br_req);
+ blkif->st_rd_req, blkif->st_wr_req, blkif->st_br_req,
+ blkif->st_pk_req);
blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
blkif->st_rd_req = 0;
blkif->st_wr_req = 0;
blkif->st_oo_req = 0;
+ blkif->st_pk_req = 0;
}
int blkif_schedule(void *arg)
@@ -374,6 +376,13 @@ handle_request:
blkif->st_wr_req++;
ret = dispatch_rw_block_io(blkif, &req, pending_req);
break;
+ case BLKIF_OP_PACKET:
+ DPRINTK("error: block operation BLKIF_OP_PACKET not implemented\n");
+ blkif->st_pk_req++;
+ make_response(blkif, req.id, req.operation,
+ BLKIF_RSP_ERROR);
+ free_req(pending_req);
+ break;
default:
/* A good sign something is wrong: sleep for a while to
* avoid excessive CPU consumption by a bad guest. */
--- sle11sp1-2010-03-22.orig/drivers/xen/blkback/common.h 2010-03-22 12:54:11.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkback/common.h 2010-03-22 12:57:06.000000000 +0100
@@ -92,6 +92,7 @@ typedef struct blkif_st {
int st_wr_req;
int st_oo_req;
int st_br_req;
+ int st_pk_req;
int st_rd_sect;
int st_wr_sect;
--- sle11sp1-2010-03-22.orig/drivers/xen/blkfront/blkfront.c 2010-03-22 12:26:04.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blkfront/blkfront.c 2010-03-22 12:57:12.000000000 +0100
@@ -671,6 +671,8 @@ static int blkif_queue_request(struct re
BLKIF_OP_WRITE : BLKIF_OP_READ;
if (blk_barrier_rq(req))
ring_req->operation = BLKIF_OP_WRITE_BARRIER;
+ if (blk_pc_request(req))
+ ring_req->operation = BLKIF_OP_PACKET;
ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg);
BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST);
@@ -728,7 +730,7 @@ void do_blkif_request(struct request_que
blk_start_request(req);
- if (!blk_fs_request(req)) {
+ if (!blk_fs_request(req) && !blk_pc_request(req)) {
__blk_end_request_all(req, -EIO);
continue;
}
@@ -799,6 +801,7 @@ static irqreturn_t blkif_int(int irq, vo
/* fall through */
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
+ case BLKIF_OP_PACKET:
if (unlikely(bret->status != BLKIF_RSP_OKAY))
DPRINTK("Bad return from blkdev data "
"request: %x\n", bret->status);
--- sle11sp1-2010-03-22.orig/drivers/xen/blktap/blktap.c 2010-01-04 13:22:24.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blktap/blktap.c 2010-01-04 13:22:46.000000000 +0100
@@ -1134,13 +1134,14 @@ static void fast_flush_area(pending_req_
static void print_stats(blkif_t *blkif)
{
- printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d\n",
+ printk(KERN_DEBUG "%s: oo %3d | rd %4d | wr %4d | pk %4d\n",
current->comm, blkif->st_oo_req,
- blkif->st_rd_req, blkif->st_wr_req);
+ blkif->st_rd_req, blkif->st_wr_req, blkif->st_pk_req);
blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000);
blkif->st_rd_req = 0;
blkif->st_wr_req = 0;
blkif->st_oo_req = 0;
+ blkif->st_pk_req = 0;
}
int tap_blkif_schedule(void *arg)
@@ -1374,6 +1375,11 @@ static int do_block_io_op(blkif_t *blkif
dispatch_rw_block_io(blkif, &req, pending_req);
break;
+ case BLKIF_OP_PACKET:
+ blkif->st_pk_req++;
+ dispatch_rw_block_io(blkif, &req, pending_req);
+ break;
+
default:
/* A good sign something is wrong: sleep for a while to
* avoid excessive CPU consumption by a bad guest. */
@@ -1413,6 +1419,8 @@ static void dispatch_rw_block_io(blkif_t
struct vm_area_struct *vma = NULL;
switch (req->operation) {
+ case BLKIF_OP_PACKET:
+ /* Fall through */
case BLKIF_OP_READ:
operation = READ;
break;
--- sle11sp1-2010-03-22.orig/drivers/xen/blktap/common.h 2009-11-06 10:51:07.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blktap/common.h 2009-07-29 10:18:11.000000000 +0200
@@ -75,6 +75,7 @@ typedef struct blkif_st {
int st_rd_req;
int st_wr_req;
int st_oo_req;
+ int st_pk_req;
int st_rd_sect;
int st_wr_sect;
--- sle11sp1-2010-03-22.orig/drivers/xen/blktap2/blktap.h 2009-12-16 11:51:26.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blktap2/blktap.h 2009-12-16 12:14:37.000000000 +0100
@@ -137,6 +137,7 @@ struct blktap_statistics {
int st_rd_req;
int st_wr_req;
int st_oo_req;
+ int st_pk_req;
int st_rd_sect;
int st_wr_sect;
s64 st_rd_cnt;
--- sle11sp1-2010-03-22.orig/drivers/xen/blktap2/device.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/blktap2/device.c 2010-01-04 13:22:52.000000000 +0100
@@ -369,7 +369,8 @@ blktap_device_fail_pending_requests(stru
BTERR("%u:%u: failing pending %s of %d pages\n",
blktap_device_major, tap->minor,
- (request->operation == BLKIF_OP_READ ?
+ (request->operation == BLKIF_OP_PACKET ?
+ "packet" : request->operation == BLKIF_OP_READ ?
"read" : "write"), request->nr_pages);
blktap_unmap(tap, request);
@@ -410,6 +411,7 @@ blktap_device_finish_request(struct blkt
switch (request->operation) {
case BLKIF_OP_READ:
case BLKIF_OP_WRITE:
+ case BLKIF_OP_PACKET:
if (unlikely(res->status != BLKIF_RSP_OKAY))
BTERR("Bad return from device data "
"request: %x\n", res->status);
@@ -648,6 +650,8 @@ blktap_device_process_request(struct blk
blkif_req.handle = 0;
blkif_req.operation = rq_data_dir(req) ?
BLKIF_OP_WRITE : BLKIF_OP_READ;
+ if (unlikely(blk_pc_request(req)))
+ blkif_req.operation = BLKIF_OP_PACKET;
request->id = (unsigned long)req;
request->operation = blkif_req.operation;
@@ -713,7 +717,9 @@ blktap_device_process_request(struct blk
wmb(); /* blktap_poll() reads req_prod_pvt asynchronously */
ring->ring.req_prod_pvt++;
- if (rq_data_dir(req)) {
+ if (unlikely(blk_pc_request(req)))
+ tap->stats.st_pk_req++;
+ else if (rq_data_dir(req)) {
tap->stats.st_wr_sect += nr_sects;
tap->stats.st_wr_req++;
} else {
--- sle11sp1-2010-03-22.orig/include/xen/interface/io/blkif.h 2009-12-04 10:44:50.000000000 +0100
+++ sle11sp1-2010-03-22/include/xen/interface/io/blkif.h 2009-07-29 10:18:11.000000000 +0200
@@ -76,6 +76,10 @@
* "feature-flush-cache" node!
*/
#define BLKIF_OP_FLUSH_DISKCACHE 3
+/*
+ * Device specific command packet contained within the request
+ */
+#define BLKIF_OP_PACKET 4
/*
* Maximum scatter/gather segments per request.

127
xen-sections Normal file
View file

@ -0,0 +1,127 @@
From: jbeulich@novell.com
Subject: fix placement of some routines/data
Patch-mainline: obsolete
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/time-xen.c 2010-02-09 17:07:46.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/time-xen.c 2010-03-01 14:45:54.000000000 +0100
@@ -674,7 +674,7 @@ int xen_update_persistent_clock(void)
/* Dynamically-mapped IRQ. */
DEFINE_PER_CPU(int, timer_irq);
-static void setup_cpu0_timer_irq(void)
+static void __init setup_cpu0_timer_irq(void)
{
per_cpu(timer_irq, 0) =
bind_virq_to_irqhandler(
@@ -899,7 +899,7 @@ int __cpuinit local_setup_timer(unsigned
return 0;
}
-void __cpuexit local_teardown_timer(unsigned int cpu)
+void __cpuinit local_teardown_timer(unsigned int cpu)
{
BUG_ON(cpu == 0);
unbind_from_irqhandler(per_cpu(timer_irq, cpu), NULL);
--- sle11sp1-2010-03-22.orig/drivers/xen/core/cpu_hotplug.c 2009-11-06 10:51:55.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/cpu_hotplug.c 2009-11-06 11:09:19.000000000 +0100
@@ -24,7 +24,7 @@ static int local_cpu_hotplug_request(voi
return (current->mm != NULL);
}
-static void vcpu_hotplug(unsigned int cpu)
+static void __cpuinit vcpu_hotplug(unsigned int cpu)
{
int err;
char dir[32], state[32];
@@ -51,7 +51,7 @@ static void vcpu_hotplug(unsigned int cp
}
}
-static void handle_vcpu_hotplug_event(
+static void __cpuinit handle_vcpu_hotplug_event(
struct xenbus_watch *watch, const char **vec, unsigned int len)
{
unsigned int cpu;
@@ -80,12 +80,12 @@ static int smpboot_cpu_notify(struct not
return NOTIFY_OK;
}
-static int setup_cpu_watcher(struct notifier_block *notifier,
- unsigned long event, void *data)
+static int __cpuinit setup_cpu_watcher(struct notifier_block *notifier,
+ unsigned long event, void *data)
{
unsigned int i;
- static struct xenbus_watch cpu_watch = {
+ static struct xenbus_watch __cpuinitdata cpu_watch = {
.node = "cpu",
.callback = handle_vcpu_hotplug_event,
.flags = XBWF_new_thread };
@@ -105,7 +105,7 @@ static int __init setup_vcpu_hotplug_eve
{
static struct notifier_block hotplug_cpu = {
.notifier_call = smpboot_cpu_notify };
- static struct notifier_block xsn_cpu = {
+ static struct notifier_block __cpuinitdata xsn_cpu = {
.notifier_call = setup_cpu_watcher };
if (!is_running_on_xen())
@@ -119,7 +119,7 @@ static int __init setup_vcpu_hotplug_eve
arch_initcall(setup_vcpu_hotplug_event);
-int smp_suspend(void)
+int __ref smp_suspend(void)
{
unsigned int cpu;
int err;
@@ -140,7 +140,7 @@ int smp_suspend(void)
return 0;
}
-void smp_resume(void)
+void __ref smp_resume(void)
{
unsigned int cpu;
--- sle11sp1-2010-03-22.orig/drivers/xen/core/smpboot.c 2010-03-22 12:25:59.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/smpboot.c 2010-03-22 12:57:24.000000000 +0100
@@ -181,7 +181,7 @@ static int __cpuinit xen_smp_intr_init(u
}
#ifdef CONFIG_HOTPLUG_CPU
-static void __cpuexit xen_smp_intr_exit(unsigned int cpu)
+static void __cpuinit xen_smp_intr_exit(unsigned int cpu)
{
if (cpu != 0)
local_teardown_timer(cpu);
@@ -400,7 +400,7 @@ int __cpuexit __cpu_disable(void)
return 0;
}
-void __cpuexit __cpu_die(unsigned int cpu)
+void __cpuinit __cpu_die(unsigned int cpu)
{
while (HYPERVISOR_vcpu_op(VCPUOP_is_up, cpu, NULL)) {
current->state = TASK_UNINTERRUPTIBLE;
--- sle11sp1-2010-03-22.orig/drivers/xen/evtchn/evtchn.c 2009-03-18 10:39:31.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/evtchn/evtchn.c 2009-11-06 11:09:19.000000000 +0100
@@ -549,14 +549,15 @@ static int __init evtchn_init(void)
return 0;
}
+module_init(evtchn_init);
+#ifdef CONFIG_MODULE
static void __exit evtchn_cleanup(void)
{
misc_deregister(&evtchn_miscdev);
unregister_cpu_notifier(&evtchn_cpu_nfb);
}
-
-module_init(evtchn_init);
module_exit(evtchn_cleanup);
+#endif
MODULE_LICENSE("Dual BSD/GPL");

177
xen-spinlock-poll-early Normal file
View file

@ -0,0 +1,177 @@
From: jbeulich@novell.com
Subject: Go into polling mode early if lock owner is not running
Patch-mainline: n/a
This could be merged into the original ticket spinlock code once
validated, if there wasn't the dependency on smp-processor-id.h, which
only gets introduced in the 2.6.32 merge.
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/spinlock.h 2010-02-23 14:24:59.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/spinlock.h 2010-02-23 14:28:29.000000000 +0100
@@ -41,6 +41,10 @@
#ifdef TICKET_SHIFT
#include <asm/irqflags.h>
+#include <asm/smp-processor-id.h>
+#include <xen/interface/vcpu.h>
+
+DECLARE_PER_CPU(struct vcpu_runstate_info, runstate);
int xen_spinlock_init(unsigned int cpu);
void xen_spinlock_cleanup(unsigned int cpu);
@@ -113,6 +117,9 @@ static __always_inline int __ticket_spin
:
: "memory", "cc");
+ if (tmp)
+ lock->owner = raw_smp_processor_id();
+
return tmp;
}
#elif TICKET_SHIFT == 16
@@ -179,10 +186,17 @@ static __always_inline int __ticket_spin
:
: "memory", "cc");
+ if (tmp)
+ lock->owner = raw_smp_processor_id();
+
return tmp;
}
#endif
+#define __ticket_spin_count(lock) \
+ (per_cpu(runstate.state, (lock)->owner) == RUNSTATE_running \
+ ? 1 << 10 : 1)
+
static inline int __ticket_spin_is_locked(raw_spinlock_t *lock)
{
int tmp = ACCESS_ONCE(lock->slock);
@@ -204,16 +218,18 @@ static __always_inline void __ticket_spi
bool free;
__ticket_spin_lock_preamble;
- if (likely(free)) {
+ if (likely(free))
+ raw_local_irq_restore(flags);
+ else {
+ token = xen_spin_adjust(lock, token);
raw_local_irq_restore(flags);
- return;
+ do {
+ count = __ticket_spin_count(lock);
+ __ticket_spin_lock_body;
+ } while (unlikely(!count)
+ && !xen_spin_wait(lock, &token, flags));
}
- token = xen_spin_adjust(lock, token);
- raw_local_irq_restore(flags);
- do {
- count = 1 << 10;
- __ticket_spin_lock_body;
- } while (unlikely(!count) && !xen_spin_wait(lock, &token, flags));
+ lock->owner = raw_smp_processor_id();
}
static __always_inline void __ticket_spin_lock_flags(raw_spinlock_t *lock,
@@ -223,13 +239,15 @@ static __always_inline void __ticket_spi
bool free;
__ticket_spin_lock_preamble;
- if (likely(free))
- return;
- token = xen_spin_adjust(lock, token);
- do {
- count = 1 << 10;
- __ticket_spin_lock_body;
- } while (unlikely(!count) && !xen_spin_wait(lock, &token, flags));
+ if (unlikely(!free)) {
+ token = xen_spin_adjust(lock, token);
+ do {
+ count = __ticket_spin_count(lock);
+ __ticket_spin_lock_body;
+ } while (unlikely(!count)
+ && !xen_spin_wait(lock, &token, flags));
+ }
+ lock->owner = raw_smp_processor_id();
}
static __always_inline void __ticket_spin_unlock(raw_spinlock_t *lock)
@@ -246,6 +264,7 @@ static __always_inline void __ticket_spi
#undef __ticket_spin_lock_preamble
#undef __ticket_spin_lock_body
#undef __ticket_spin_unlock_body
+#undef __ticket_spin_count
#endif
#define __raw_spin(n) __ticket_spin_##n
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/spinlock_types.h 2010-01-18 16:52:32.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/spinlock_types.h 2010-01-26 09:48:51.000000000 +0100
@@ -24,6 +24,11 @@ typedef union {
# define TICKET_SHIFT 16
u16 cur, seq;
#endif
+#if CONFIG_NR_CPUS <= 256
+ u8 owner;
+#else
+ u16 owner;
+#endif
#else
/*
* This differs from the pre-2.6.24 spinlock by always using xchgb
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/time-xen.c 2010-03-01 14:46:13.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/time-xen.c 2010-02-04 09:43:52.000000000 +0100
@@ -64,7 +64,7 @@ static DEFINE_PER_CPU(u64, processed_sto
static DEFINE_PER_CPU(u64, processed_blocked_time);
/* Current runstate of each CPU (updated automatically by the hypervisor). */
-static DEFINE_PER_CPU(struct vcpu_runstate_info, runstate);
+DEFINE_PER_CPU(struct vcpu_runstate_info, runstate);
/* Must be signed, as it's compared with s64 quantities which can be -ve. */
#define NS_PER_TICK (1000000000LL/HZ)
--- sle11sp1-2010-03-22.orig/drivers/xen/core/spinlock.c 2010-02-23 12:31:40.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/spinlock.c 2010-03-22 12:58:39.000000000 +0100
@@ -38,6 +38,8 @@ int __cpuinit xen_spinlock_init(unsigned
};
int rc;
+ setup_runstate_area(cpu);
+
rc = bind_ipi_to_irqaction(SPIN_UNLOCK_VECTOR,
cpu,
&spinlock_action);
@@ -85,6 +87,7 @@ unsigned int xen_spin_adjust(const raw_s
bool xen_spin_wait(raw_spinlock_t *lock, unsigned int *ptok,
unsigned int flags)
{
+ unsigned int cpu = raw_smp_processor_id();
int irq = spinlock_irq;
bool rc;
typeof(vcpu_info(0)->evtchn_upcall_mask) upcall_mask;
@@ -92,7 +95,7 @@ bool xen_spin_wait(raw_spinlock_t *lock,
struct spinning spinning, *other;
/* If kicker interrupt not initialized yet, just spin. */
- if (unlikely(irq < 0) || unlikely(!cpu_online(raw_smp_processor_id())))
+ if (unlikely(irq < 0) || unlikely(!cpu_online(cpu)))
return false;
/* announce we're spinning */
@@ -113,6 +116,7 @@ bool xen_spin_wait(raw_spinlock_t *lock,
* we weren't looking.
*/
if (lock->cur == spinning.ticket) {
+ lock->owner = cpu;
/*
* If we interrupted another spinlock while it was
* blocking, make sure it doesn't block (again)
@@ -206,6 +210,8 @@ bool xen_spin_wait(raw_spinlock_t *lock,
if (!free)
token = spin_adjust(other->prev, lock, token);
other->ticket = token >> TICKET_SHIFT;
+ if (lock->cur == other->ticket)
+ lock->owner = cpu;
}
raw_local_irq_restore(upcall_mask);

40
xen-staging-build Normal file
View file

@ -0,0 +1,40 @@
From: jbeulich@novell.com
Subject: fix issue with Windows-style types used in drivers/staging/
Patch-mainline: obsolete
--- head-2009-11-20.orig/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:44:04.000000000 +0100
+++ head-2009-11-20/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:45:08.000000000 +0100
@@ -354,4 +354,9 @@ MULTI_grant_table_op(multicall_entry_t *
#define uvm_multi(cpumask) ((unsigned long)cpumask_bits(cpumask) | UVMF_MULTI)
+#ifdef LINUX
+/* drivers/staging/ use Windows-style types, including VOID */
+#undef VOID
+#endif
+
#endif /* __HYPERVISOR_H__ */
--- head-2009-11-20.orig/drivers/staging/vt6655/ttype.h 2009-11-23 10:15:03.000000000 +0100
+++ head-2009-11-20/drivers/staging/vt6655/ttype.h 2009-10-13 17:02:12.000000000 +0200
@@ -30,6 +30,9 @@
#ifndef __TTYPE_H__
#define __TTYPE_H__
+#ifdef CONFIG_XEN
+#include <asm/hypervisor.h>
+#endif
/******* Common definitions and typedefs ***********************************/
--- head-2009-11-20.orig/drivers/staging/vt6656/ttype.h 2009-11-23 10:15:03.000000000 +0100
+++ head-2009-11-20/drivers/staging/vt6656/ttype.h 2009-10-13 17:02:12.000000000 +0200
@@ -30,6 +30,9 @@
#ifndef __TTYPE_H__
#define __TTYPE_H__
+#ifdef CONFIG_XEN
+#include <asm/hypervisor.h>
+#endif
/******* Common definitions and typedefs ***********************************/

32
xen-swiotlb-heuristics Normal file
View file

@ -0,0 +1,32 @@
From: jbeulich@novell.com
Subject: adjust Xen's swiotlb default size setting
Patch-mainline: obsolete
--- head-2009-10-12.orig/lib/swiotlb-xen.c 2009-10-14 15:52:38.000000000 +0200
+++ head-2009-10-12/lib/swiotlb-xen.c 2009-10-14 16:20:35.000000000 +0200
@@ -211,8 +211,8 @@ swiotlb_init_with_default_size(size_t de
void __init
swiotlb_init(void)
{
- long ram_end;
- size_t defsz = 64 * (1 << 20); /* 64MB default size */
+ unsigned long ram_end;
+ size_t defsz = 64 << 20; /* 64MB default size */
if (swiotlb_force == 1) {
swiotlb = 1;
@@ -221,8 +221,12 @@ swiotlb_init(void)
is_initial_xendomain()) {
/* Domain 0 always has a swiotlb. */
ram_end = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
- if (ram_end <= 0x7ffff)
- defsz = 2 * (1 << 20); /* 2MB on <2GB on systems. */
+ if (ram_end <= 0x1ffff)
+ defsz = 2 << 20; /* 2MB on <512MB systems. */
+ else if (ram_end <= 0x3ffff)
+ defsz = 4 << 20; /* 4MB on <1GB systems. */
+ else if (ram_end <= 0x7ffff)
+ defsz = 8 << 20; /* 8MB on <2GB systems. */
swiotlb = 1;
}

506
xen-sysdev-suspend Normal file
View file

@ -0,0 +1,506 @@
From: jbeulich@novell.com
Subject: use base kernel suspend/resume infrastructure
Patch-mainline: obsolete
... rather than calling just a few functions explicitly.
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/time-xen.c 2010-03-01 14:45:54.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/time-xen.c 2010-03-01 14:46:04.000000000 +0100
@@ -69,6 +69,10 @@ static DEFINE_PER_CPU(struct vcpu_runsta
/* Must be signed, as it's compared with s64 quantities which can be -ve. */
#define NS_PER_TICK (1000000000LL/HZ)
+static struct vcpu_set_periodic_timer xen_set_periodic_tick = {
+ .period_ns = NS_PER_TICK
+};
+
static void __clock_was_set(struct work_struct *unused)
{
clock_was_set();
@@ -559,6 +563,17 @@ void mark_tsc_unstable(char *reason)
}
EXPORT_SYMBOL_GPL(mark_tsc_unstable);
+static void init_missing_ticks_accounting(unsigned int cpu)
+{
+ struct vcpu_runstate_info *runstate = setup_runstate_area(cpu);
+
+ per_cpu(processed_blocked_time, cpu) =
+ runstate->time[RUNSTATE_blocked];
+ per_cpu(processed_stolen_time, cpu) =
+ runstate->time[RUNSTATE_runnable] +
+ runstate->time[RUNSTATE_offline];
+}
+
static cycle_t cs_last;
static cycle_t xen_clocksource_read(struct clocksource *cs)
@@ -595,11 +610,32 @@ static cycle_t xen_clocksource_read(stru
#endif
}
+/* No locking required. Interrupts are disabled on all CPUs. */
static void xen_clocksource_resume(void)
{
- extern void time_resume(void);
+ unsigned int cpu;
+
+ init_cpu_khz();
+
+ for_each_online_cpu(cpu) {
+ switch (HYPERVISOR_vcpu_op(VCPUOP_set_periodic_timer, cpu,
+ &xen_set_periodic_tick)) {
+ case 0:
+#if CONFIG_XEN_COMPAT <= 0x030004
+ case -ENOSYS:
+#endif
+ break;
+ default:
+ BUG();
+ }
+ get_time_values_from_xen(cpu);
+ per_cpu(processed_system_time, cpu) =
+ per_cpu(shadow_time, 0).system_timestamp;
+ init_missing_ticks_accounting(cpu);
+ }
+
+ processed_system_time = per_cpu(shadow_time, 0).system_timestamp;
- time_resume();
cs_last = local_clock();
}
@@ -631,17 +667,6 @@ struct vcpu_runstate_info *setup_runstat
return runstate;
}
-static void init_missing_ticks_accounting(unsigned int cpu)
-{
- struct vcpu_runstate_info *runstate = setup_runstate_area(cpu);
-
- per_cpu(processed_blocked_time, cpu) =
- runstate->time[RUNSTATE_blocked];
- per_cpu(processed_stolen_time, cpu) =
- runstate->time[RUNSTATE_runnable] +
- runstate->time[RUNSTATE_offline];
-}
-
void xen_read_persistent_clock(struct timespec *ts)
{
const shared_info_t *s = HYPERVISOR_shared_info;
@@ -687,10 +712,6 @@ static void __init setup_cpu0_timer_irq(
BUG_ON(per_cpu(timer_irq, 0) < 0);
}
-static struct vcpu_set_periodic_timer xen_set_periodic_tick = {
- .period_ns = NS_PER_TICK
-};
-
void __init time_init(void)
{
init_cpu_khz();
@@ -828,35 +849,6 @@ void xen_halt(void)
}
EXPORT_SYMBOL(xen_halt);
-/* No locking required. Interrupts are disabled on all CPUs. */
-void time_resume(void)
-{
- unsigned int cpu;
-
- init_cpu_khz();
-
- for_each_online_cpu(cpu) {
- switch (HYPERVISOR_vcpu_op(VCPUOP_set_periodic_timer, cpu,
- &xen_set_periodic_tick)) {
- case 0:
-#if CONFIG_XEN_COMPAT <= 0x030004
- case -ENOSYS:
-#endif
- break;
- default:
- BUG();
- }
- get_time_values_from_xen(cpu);
- per_cpu(processed_system_time, cpu) =
- per_cpu(shadow_time, 0).system_timestamp;
- init_missing_ticks_accounting(cpu);
- }
-
- processed_system_time = per_cpu(shadow_time, 0).system_timestamp;
-
- update_wallclock();
-}
-
#ifdef CONFIG_SMP
static char timer_name[NR_CPUS][15];
--- sle11sp1-2010-03-01.orig/drivers/xen/core/evtchn.c 2009-11-06 11:04:38.000000000 +0100
+++ sle11sp1-2010-03-01/drivers/xen/core/evtchn.c 2010-02-09 17:18:45.000000000 +0100
@@ -35,6 +35,7 @@
#include <linux/interrupt.h>
#include <linux/sched.h>
#include <linux/kernel_stat.h>
+#include <linux/sysdev.h>
#include <linux/ftrace.h>
#include <linux/version.h>
#include <asm/atomic.h>
@@ -1115,10 +1116,21 @@ static void restore_cpu_ipis(unsigned in
}
}
-void irq_resume(void)
+static int evtchn_resume(struct sys_device *dev)
{
unsigned int cpu, irq, evtchn;
struct irq_cfg *cfg;
+ struct evtchn_status status;
+
+ /* Avoid doing anything in the 'suspend cancelled' case. */
+ status.dom = DOMID_SELF;
+ status.port = evtchn_from_irq(percpu_read(virq_to_irq[VIRQ_TIMER]));
+ if (HYPERVISOR_event_channel_op(EVTCHNOP_status, &status))
+ BUG();
+ if (status.status == EVTCHNSTAT_virq
+ && status.vcpu == smp_processor_id()
+ && status.u.virq == VIRQ_TIMER)
+ return 0;
init_evtchn_cpu_bindings();
@@ -1154,7 +1166,32 @@ void irq_resume(void)
restore_cpu_ipis(cpu);
}
+ return 0;
+}
+
+static struct sysdev_class evtchn_sysclass = {
+ .name = "evtchn",
+ .resume = evtchn_resume,
+};
+
+static struct sys_device device_evtchn = {
+ .id = 0,
+ .cls = &evtchn_sysclass,
+};
+
+static int __init evtchn_register(void)
+{
+ int err;
+
+ if (is_initial_xendomain())
+ return 0;
+
+ err = sysdev_class_register(&evtchn_sysclass);
+ if (!err)
+ err = sysdev_register(&device_evtchn);
+ return err;
}
+core_initcall(evtchn_register);
#endif
int __init arch_early_irq_init(void)
--- sle11sp1-2010-03-01.orig/drivers/xen/core/gnttab.c 2009-12-15 09:24:56.000000000 +0100
+++ sle11sp1-2010-03-01/drivers/xen/core/gnttab.c 2009-12-15 09:28:00.000000000 +0100
@@ -35,6 +35,7 @@
#include <linux/sched.h>
#include <linux/mm.h>
#include <linux/seqlock.h>
+#include <linux/sysdev.h>
#include <xen/interface/xen.h>
#include <xen/gnttab.h>
#include <asm/pgtable.h>
@@ -707,23 +708,37 @@ EXPORT_SYMBOL(gnttab_post_map_adjust);
#endif /* __HAVE_ARCH_PTE_SPECIAL */
-int gnttab_resume(void)
+static int gnttab_resume(struct sys_device *dev)
{
if (max_nr_grant_frames() < nr_grant_frames)
return -ENOSYS;
return gnttab_map(0, nr_grant_frames - 1);
}
+#define gnttab_resume() gnttab_resume(NULL)
#ifdef CONFIG_PM_SLEEP
-int gnttab_suspend(void)
-{
#ifdef CONFIG_X86
+static int gnttab_suspend(struct sys_device *dev, pm_message_t state)
+{
apply_to_page_range(&init_mm, (unsigned long)shared,
PAGE_SIZE * nr_grant_frames,
unmap_pte_fn, NULL);
-#endif
return 0;
}
+#else
+#define gnttab_suspend NULL
+#endif
+
+static struct sysdev_class gnttab_sysclass = {
+ .name = "gnttab",
+ .resume = gnttab_resume,
+ .suspend = gnttab_suspend,
+};
+
+static struct sys_device device_gnttab = {
+ .id = 0,
+ .cls = &gnttab_sysclass,
+};
#endif
#else /* !CONFIG_XEN */
@@ -803,6 +818,17 @@ int __devinit gnttab_init(void)
if (!is_running_on_xen())
return -ENODEV;
+#if defined(CONFIG_XEN) && defined(CONFIG_PM_SLEEP)
+ if (!is_initial_xendomain()) {
+ int err = sysdev_class_register(&gnttab_sysclass);
+
+ if (!err)
+ err = sysdev_register(&device_gnttab);
+ if (err)
+ return err;
+ }
+#endif
+
nr_grant_frames = 1;
boot_max_nr_grant_frames = __max_nr_grant_frames();
--- sle11sp1-2010-03-01.orig/drivers/xen/core/machine_reboot.c 2009-12-18 13:34:27.000000000 +0100
+++ sle11sp1-2010-03-01/drivers/xen/core/machine_reboot.c 2009-12-18 14:19:13.000000000 +0100
@@ -17,6 +17,7 @@
#include <xen/xencons.h>
#include <xen/cpu_hotplug.h>
#include <xen/interface/vcpu.h>
+#include "../../base/base.h"
#if defined(__i386__) || defined(__x86_64__)
#include <asm/pci_x86.h>
@@ -145,47 +146,28 @@ struct suspend {
static int take_machine_down(void *_suspend)
{
struct suspend *suspend = _suspend;
- int suspend_cancelled, err;
- extern void time_resume(void);
+ int suspend_cancelled;
- if (suspend->fast_suspend) {
- BUG_ON(!irqs_disabled());
- } else {
- BUG_ON(irqs_disabled());
-
- for (;;) {
- err = smp_suspend();
- if (err)
- return err;
-
- xenbus_suspend();
- preempt_disable();
-
- if (num_online_cpus() == 1)
- break;
-
- preempt_enable();
- xenbus_suspend_cancel();
- }
-
- local_irq_disable();
- }
+ BUG_ON(!irqs_disabled());
mm_pin_all();
- gnttab_suspend();
- pre_suspend();
-
- /*
- * This hypercall returns 1 if suspend was cancelled or the domain was
- * merely checkpointed, and 0 if it is resuming in a new domain.
- */
- suspend_cancelled = HYPERVISOR_suspend(virt_to_mfn(xen_start_info));
+ suspend_cancelled = sysdev_suspend(PMSG_SUSPEND);
+ if (!suspend_cancelled) {
+ pre_suspend();
+ /*
+ * This hypercall returns 1 if suspend was cancelled or the domain was
+ * merely checkpointed, and 0 if it is resuming in a new domain.
+ */
+ suspend_cancelled = HYPERVISOR_suspend(virt_to_mfn(xen_start_info));
+ } else
+ BUG_ON(suspend_cancelled > 0);
suspend->resume_notifier(suspend_cancelled);
- post_suspend(suspend_cancelled);
- gnttab_resume();
+ if (suspend_cancelled >= 0) {
+ post_suspend(suspend_cancelled);
+ sysdev_resume();
+ }
if (!suspend_cancelled) {
- irq_resume();
#ifdef __x86_64__
/*
* Older versions of Xen do not save/restore the user %cr3.
@@ -197,10 +179,6 @@ static int take_machine_down(void *_susp
current->active_mm->pgd)));
#endif
}
- time_resume();
-
- if (!suspend->fast_suspend)
- local_irq_enable();
return suspend_cancelled;
}
@@ -208,8 +186,14 @@ static int take_machine_down(void *_susp
int __xen_suspend(int fast_suspend, void (*resume_notifier)(int))
{
int err, suspend_cancelled;
+ const char *what;
struct suspend suspend;
+#define _check(fn, args...) ({ \
+ what = #fn; \
+ err = (fn)(args); \
+})
+
BUG_ON(smp_processor_id() != 0);
BUG_ON(in_interrupt());
@@ -225,41 +209,91 @@ int __xen_suspend(int fast_suspend, void
if (num_possible_cpus() == 1)
fast_suspend = 0;
- if (fast_suspend) {
- err = stop_machine_create();
- if (err)
- return err;
+ if (fast_suspend && _check(stop_machine_create)) {
+ printk(KERN_ERR "%s() failed: %d\n", what, err);
+ return err;
}
suspend.fast_suspend = fast_suspend;
suspend.resume_notifier = resume_notifier;
+ if (_check(dpm_suspend_start, PMSG_SUSPEND)) {
+ if (fast_suspend)
+ stop_machine_destroy();
+ printk(KERN_ERR "%s() failed: %d\n", what, err);
+ return err;
+ }
+
if (fast_suspend) {
xenbus_suspend();
+
+ if (_check(dpm_suspend_noirq, PMSG_SUSPEND)) {
+ xenbus_suspend_cancel();
+ dpm_resume_end(PMSG_RESUME);
+ stop_machine_destroy();
+ printk(KERN_ERR "%s() failed: %d\n", what, err);
+ return err;
+ }
+
err = stop_machine(take_machine_down, &suspend,
&cpumask_of_cpu(0));
if (err < 0)
xenbus_suspend_cancel();
} else {
+ BUG_ON(irqs_disabled());
+
+ for (;;) {
+ xenbus_suspend();
+
+ if (!_check(dpm_suspend_noirq, PMSG_SUSPEND)
+ && _check(smp_suspend))
+ dpm_resume_noirq(PMSG_RESUME);
+ if (err) {
+ xenbus_suspend_cancel();
+ dpm_resume_end(PMSG_RESUME);
+ printk(KERN_ERR "%s() failed: %d\n",
+ what, err);
+ return err;
+ }
+
+ preempt_disable();
+
+ if (num_online_cpus() == 1)
+ break;
+
+ preempt_enable();
+
+ dpm_resume_noirq(PMSG_RESUME);
+
+ xenbus_suspend_cancel();
+ }
+
+ local_irq_disable();
err = take_machine_down(&suspend);
+ local_irq_enable();
}
- if (err < 0)
- return err;
+ dpm_resume_noirq(PMSG_RESUME);
- suspend_cancelled = err;
- if (!suspend_cancelled) {
- xencons_resume();
- xenbus_resume();
- } else {
- xenbus_suspend_cancel();
+ if (err >= 0) {
+ suspend_cancelled = err;
+ if (!suspend_cancelled) {
+ xencons_resume();
+ xenbus_resume();
+ } else {
+ xenbus_suspend_cancel();
+ err = 0;
+ }
+
+ if (!fast_suspend)
+ smp_resume();
}
- if (!fast_suspend)
- smp_resume();
- else
+ dpm_resume_end(PMSG_RESUME);
+
+ if (fast_suspend)
stop_machine_destroy();
- return 0;
+ return err;
}
#endif
--- sle11sp1-2010-03-01.orig/include/xen/evtchn.h 2009-12-18 10:10:04.000000000 +0100
+++ sle11sp1-2010-03-01/include/xen/evtchn.h 2009-12-18 10:13:12.000000000 +0100
@@ -107,7 +107,9 @@ int bind_ipi_to_irqhandler(
*/
void unbind_from_irqhandler(unsigned int irq, void *dev_id);
+#ifndef CONFIG_XEN
void irq_resume(void);
+#endif
/* Entry point for notifications into Linux subsystems. */
asmlinkage void evtchn_do_upcall(struct pt_regs *regs);
--- sle11sp1-2010-03-01.orig/include/xen/gnttab.h 2008-11-04 11:13:10.000000000 +0100
+++ sle11sp1-2010-03-01/include/xen/gnttab.h 2009-11-06 11:10:15.000000000 +0100
@@ -110,8 +110,9 @@ static inline void __gnttab_dma_unmap_pa
void gnttab_reset_grant_page(struct page *page);
-int gnttab_suspend(void);
+#ifndef CONFIG_XEN
int gnttab_resume(void);
+#endif
void *arch_gnttab_alloc_shared(unsigned long *frames);

292
xen-unpriv-build Normal file
View file

@ -0,0 +1,292 @@
From: jbeulich@novell.com
Subject: no need to build certain bits when building non-privileged kernel
Patch-mainline: n/a
--- sle11sp1-2010-03-29.orig/arch/x86/Kconfig 2010-02-09 17:06:32.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/Kconfig 2010-02-09 17:19:17.000000000 +0100
@@ -698,6 +698,7 @@ config HPET_EMULATE_RTC
config DMI
default y
bool "Enable DMI scanning" if EMBEDDED
+ depends on !XEN_UNPRIVILEGED_GUEST
---help---
Enabled scanning of DMI to identify machine quirks. Say Y
here unless you have verified that your setup is not
@@ -778,6 +779,7 @@ config AMD_IOMMU_STATS
# need this always selected by IOMMU for the VIA workaround
config SWIOTLB
def_bool y if X86_64 || XEN
+ prompt "Software I/O TLB" if XEN_UNPRIVILEGED_GUEST && !XEN_PCIDEV_FRONTEND
---help---
Support for software bounce buffers used on x86-64 systems
which don't have a hardware IOMMU (e.g. the current generation
@@ -1974,13 +1976,15 @@ config PCI_GOBIOS
config PCI_GOMMCONFIG
bool "MMConfig"
+ depends on !XEN_UNPRIVILEGED_GUEST
config PCI_GODIRECT
bool "Direct"
+ depends on !XEN_UNPRIVILEGED_GUEST
config PCI_GOOLPC
bool "OLPC"
- depends on OLPC
+ depends on OLPC && !XEN_UNPRIVILEGED_GUEST
config PCI_GOXEN_FE
bool "Xen PCI Frontend"
@@ -1991,6 +1995,7 @@ config PCI_GOXEN_FE
config PCI_GOANY
bool "Any"
+ depends on !XEN_UNPRIVILEGED_GUEST
endchoice
@@ -2021,7 +2026,7 @@ config PCI_MMCONFIG
config XEN_PCIDEV_FRONTEND
def_bool y
- prompt "Xen PCI Frontend" if X86_64
+ prompt "Xen PCI Frontend" if X86_64 && !XEN_UNPRIVILEGED_GUEST
depends on PCI && XEN && (PCI_GOXEN_FE || PCI_GOANY || X86_64)
select HOTPLUG
help
@@ -2226,7 +2231,9 @@ source "net/Kconfig"
source "drivers/Kconfig"
+if !XEN_UNPRIVILEGED_GUEST
source "drivers/firmware/Kconfig"
+endif
source "fs/Kconfig"
--- sle11sp1-2010-03-29.orig/arch/x86/include/mach-xen/asm/swiotlb.h 2009-11-06 10:51:32.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/include/mach-xen/asm/swiotlb.h 2010-01-27 15:05:03.000000000 +0100
@@ -1,4 +1,8 @@
#include_next <asm/swiotlb.h>
+#ifndef CONFIG_SWIOTLB
+#define swiotlb_init()
+#endif
+
dma_addr_t swiotlb_map_single_phys(struct device *, phys_addr_t, size_t size,
int dir);
--- sle11sp1-2010-03-29.orig/drivers/firmware/Kconfig 2009-11-06 10:51:32.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/firmware/Kconfig 2009-11-06 11:10:32.000000000 +0100
@@ -114,7 +114,7 @@ config DMIID
config ISCSI_IBFT_FIND
bool "iSCSI Boot Firmware Table Attributes"
- depends on X86 && !XEN_UNPRIVILEGED_GUEST
+ depends on X86
default n
help
This option enables the kernel to find the region of memory
--- sle11sp1-2010-03-29.orig/drivers/xen/Kconfig 2010-03-29 09:13:14.000000000 +0200
+++ sle11sp1-2010-03-29/drivers/xen/Kconfig 2010-03-29 09:13:58.000000000 +0200
@@ -274,6 +274,7 @@ config XEN_USB_FRONTEND_HCD_PM
config XEN_GRANT_DEV
tristate "User-space granted page access driver"
+ depends on XEN_BACKEND != n
default XEN_PRIVILEGED_GUEST
help
Device for accessing (in user-space) pages that have been granted
--- sle11sp1-2010-03-29.orig/drivers/xen/balloon/balloon.c 2010-02-02 15:08:54.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/balloon/balloon.c 2010-03-31 10:00:17.000000000 +0200
@@ -663,6 +663,9 @@ void balloon_update_driver_allowance(lon
bs.driver_pages += delta;
balloon_unlock(flags);
}
+EXPORT_SYMBOL_GPL(balloon_update_driver_allowance);
+
+#if defined(CONFIG_XEN_BACKEND) || defined(CONFIG_XEN_BACKEND_MODULE)
#ifdef CONFIG_XEN
static int dealloc_pte_fn(
@@ -771,6 +774,7 @@ struct page **alloc_empty_pages_and_page
pagevec = NULL;
goto out;
}
+EXPORT_SYMBOL_GPL(alloc_empty_pages_and_pagevec);
void free_empty_pages_and_pagevec(struct page **pagevec, int nr_pages)
{
@@ -791,6 +795,9 @@ void free_empty_pages_and_pagevec(struct
schedule_work(&balloon_worker);
}
+EXPORT_SYMBOL_GPL(free_empty_pages_and_pagevec);
+
+#endif /* CONFIG_XEN_BACKEND */
void balloon_release_driver_page(struct page *page)
{
@@ -804,10 +811,6 @@ void balloon_release_driver_page(struct
schedule_work(&balloon_worker);
}
-
-EXPORT_SYMBOL_GPL(balloon_update_driver_allowance);
-EXPORT_SYMBOL_GPL(alloc_empty_pages_and_pagevec);
-EXPORT_SYMBOL_GPL(free_empty_pages_and_pagevec);
EXPORT_SYMBOL_GPL(balloon_release_driver_page);
MODULE_LICENSE("Dual BSD/GPL");
--- sle11sp1-2010-03-29.orig/drivers/xen/core/Makefile 2010-01-04 16:17:00.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/Makefile 2009-11-06 11:10:32.000000000 +0100
@@ -2,9 +2,10 @@
# Makefile for the linux kernel.
#
-obj-y := evtchn.o gnttab.o reboot.o machine_reboot.o firmware.o
+obj-y := evtchn.o gnttab.o reboot.o machine_reboot.o
-obj-$(CONFIG_PCI) += pci.o
+priv-$(CONFIG_PCI) += pci.o
+obj-$(CONFIG_XEN_PRIVILEGED_GUEST) += firmware.o $(priv-y)
obj-$(CONFIG_PROC_FS) += xen_proc.o
obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor_sysfs.o
obj-$(CONFIG_HOTPLUG_CPU) += cpu_hotplug.o
--- sle11sp1-2010-03-29.orig/drivers/xen/core/gnttab.c 2010-02-02 15:10:01.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/gnttab.c 2009-12-15 09:29:45.000000000 +0100
@@ -437,8 +437,6 @@ static inline unsigned int max_nr_grant_
#ifdef CONFIG_XEN
-static DEFINE_SEQLOCK(gnttab_dma_lock);
-
#ifdef CONFIG_X86
static int map_pte_fn(pte_t *pte, struct page *pmd_page,
unsigned long addr, void *data)
@@ -508,6 +506,10 @@ static int gnttab_map(unsigned int start
return 0;
}
+#if defined(CONFIG_XEN_BACKEND) || defined(CONFIG_XEN_BACKEND_MODULE)
+
+static DEFINE_SEQLOCK(gnttab_dma_lock);
+
static void gnttab_page_free(struct page *page, unsigned int order)
{
BUG_ON(order);
@@ -639,6 +641,8 @@ void __gnttab_dma_map_page(struct page *
} while (unlikely(read_seqretry(&gnttab_dma_lock, seq)));
}
+#endif /* CONFIG_XEN_BACKEND */
+
#ifdef __HAVE_ARCH_PTE_SPECIAL
static unsigned int GNTMAP_pte_special;
--- sle11sp1-2010-03-29.orig/drivers/xen/privcmd/Makefile 2007-07-10 09:42:30.000000000 +0200
+++ sle11sp1-2010-03-29/drivers/xen/privcmd/Makefile 2009-12-18 08:20:46.000000000 +0100
@@ -1,3 +1,3 @@
-
-obj-y += privcmd.o
-obj-$(CONFIG_COMPAT) += compat_privcmd.o
+priv-$(CONFIG_COMPAT) := compat_privcmd.o
+obj-y := privcmd.o
+obj-$(CONFIG_XEN_PRIVILEGED_GUEST) += $(priv-y)
--- sle11sp1-2010-03-29.orig/drivers/xen/privcmd/privcmd.c 2010-01-27 14:39:09.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/privcmd/privcmd.c 2010-01-27 15:05:18.000000000 +0100
@@ -33,6 +33,9 @@
static struct proc_dir_entry *privcmd_intf;
static struct proc_dir_entry *capabilities_intf;
+#ifndef CONFIG_XEN_PRIVILEGED_GUEST
+#define HAVE_ARCH_PRIVCMD_MMAP
+#endif
#ifndef HAVE_ARCH_PRIVCMD_MMAP
static int enforce_singleshot_mapping_fn(pte_t *pte, struct page *pmd_page,
unsigned long addr, void *data)
@@ -57,12 +60,14 @@ static long privcmd_ioctl(struct file *f
{
long ret;
void __user *udata = (void __user *) data;
+#ifdef CONFIG_XEN_PRIVILEGED_GUEST
unsigned long i, addr, nr, nr_pages;
int paged_out;
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
LIST_HEAD(pagelist);
struct list_head *l, *l2;
+#endif
switch (cmd) {
case IOCTL_PRIVCMD_HYPERCALL: {
@@ -87,6 +92,8 @@ static long privcmd_ioctl(struct file *f
}
break;
+#ifdef CONFIG_XEN_PRIVILEGED_GUEST
+
case IOCTL_PRIVCMD_MMAP: {
#define MMAP_NR_PER_PAGE \
(unsigned long)((PAGE_SIZE - sizeof(*l)) / sizeof(*msg))
@@ -392,6 +399,8 @@ static long privcmd_ioctl(struct file *f
}
break;
+#endif /* CONFIG_XEN_PRIVILEGED_GUEST */
+
default:
ret = -EINVAL;
break;
@@ -427,7 +436,9 @@ static int privcmd_mmap(struct file * fi
static const struct file_operations privcmd_file_ops = {
.unlocked_ioctl = privcmd_ioctl,
+#ifdef CONFIG_XEN_PRIVILEGED_GUEST
.mmap = privcmd_mmap,
+#endif
};
static int capabilities_read(char *page, char **start, off_t off,
--- sle11sp1-2010-03-29.orig/fs/compat_ioctl.c 2010-03-05 10:13:02.000000000 +0100
+++ sle11sp1-2010-03-29/fs/compat_ioctl.c 2010-03-05 10:25:22.000000000 +0100
@@ -2741,10 +2741,12 @@ IGNORE_IOCTL(FBIOSCURSOR32)
IGNORE_IOCTL(FBIOGCURSOR32)
#endif
-#ifdef CONFIG_XEN
+#ifdef CONFIG_XEN_PRIVILEGED_GUEST
HANDLE_IOCTL(IOCTL_PRIVCMD_MMAP_32, privcmd_ioctl_32)
HANDLE_IOCTL(IOCTL_PRIVCMD_MMAPBATCH_32, privcmd_ioctl_32)
HANDLE_IOCTL(IOCTL_PRIVCMD_MMAPBATCH_V2_32, privcmd_ioctl_32)
+#endif
+#ifdef CONFIG_XEN
COMPATIBLE_IOCTL(IOCTL_PRIVCMD_HYPERCALL)
COMPATIBLE_IOCTL(IOCTL_EVTCHN_BIND_VIRQ)
COMPATIBLE_IOCTL(IOCTL_EVTCHN_BIND_INTERDOMAIN)
--- sle11sp1-2010-03-29.orig/include/xen/firmware.h 2007-07-02 08:16:19.000000000 +0200
+++ sle11sp1-2010-03-29/include/xen/firmware.h 2009-11-06 11:10:32.000000000 +0100
@@ -5,6 +5,10 @@
void copy_edd(void);
#endif
+#ifdef CONFIG_XEN_PRIVILEGED_GUEST
void copy_edid(void);
+#else
+static inline void copy_edid(void) {}
+#endif
#endif /* __XEN_FIRMWARE_H__ */
--- sle11sp1-2010-03-29.orig/include/xen/gnttab.h 2009-11-06 11:10:15.000000000 +0100
+++ sle11sp1-2010-03-29/include/xen/gnttab.h 2009-12-15 09:54:17.000000000 +0100
@@ -103,7 +103,11 @@ void gnttab_grant_foreign_transfer_ref(g
unsigned long pfn);
int gnttab_copy_grant_page(grant_ref_t ref, struct page **pagep);
+#if defined(CONFIG_XEN_BACKEND) || defined(CONFIG_XEN_BACKEND_MODULE)
void __gnttab_dma_map_page(struct page *page);
+#else
+#define __gnttab_dma_map_page __gnttab_dma_unmap_page
+#endif
static inline void __gnttab_dma_unmap_page(struct page *page)
{
}

649
xen-virq-per-cpu-irq Normal file
View file

@ -0,0 +1,649 @@
From: jbeulich@novell.com
Subject: fold per-CPU VIRQs onto a single IRQ each
Patch-mainline: obsolete
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/time-xen.c 2010-03-01 14:46:04.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/time-xen.c 2010-03-01 14:46:13.000000000 +0100
@@ -697,19 +697,17 @@ int xen_update_persistent_clock(void)
}
/* Dynamically-mapped IRQ. */
-DEFINE_PER_CPU(int, timer_irq);
+static int __read_mostly timer_irq = -1;
+static struct irqaction timer_action = {
+ .handler = timer_interrupt,
+ .flags = IRQF_DISABLED|IRQF_TIMER,
+ .name = "timer"
+};
static void __init setup_cpu0_timer_irq(void)
{
- per_cpu(timer_irq, 0) =
- bind_virq_to_irqhandler(
- VIRQ_TIMER,
- 0,
- timer_interrupt,
- IRQF_DISABLED|IRQF_TIMER|IRQF_NOBALANCING,
- "timer0",
- NULL);
- BUG_ON(per_cpu(timer_irq, 0) < 0);
+ timer_irq = bind_virq_to_irqaction(VIRQ_TIMER, 0, &timer_action);
+ BUG_ON(timer_irq < 0);
}
void __init time_init(void)
@@ -850,8 +848,6 @@ void xen_halt(void)
EXPORT_SYMBOL(xen_halt);
#ifdef CONFIG_SMP
-static char timer_name[NR_CPUS][15];
-
int __cpuinit local_setup_timer(unsigned int cpu)
{
int seq, irq;
@@ -877,16 +873,10 @@ int __cpuinit local_setup_timer(unsigned
init_missing_ticks_accounting(cpu);
} while (read_seqretry(&xtime_lock, seq));
- sprintf(timer_name[cpu], "timer%u", cpu);
- irq = bind_virq_to_irqhandler(VIRQ_TIMER,
- cpu,
- timer_interrupt,
- IRQF_DISABLED|IRQF_TIMER|IRQF_NOBALANCING,
- timer_name[cpu],
- NULL);
+ irq = bind_virq_to_irqaction(VIRQ_TIMER, cpu, &timer_action);
if (irq < 0)
return irq;
- per_cpu(timer_irq, cpu) = irq;
+ BUG_ON(timer_irq != irq);
return 0;
}
@@ -894,7 +884,7 @@ int __cpuinit local_setup_timer(unsigned
void __cpuinit local_teardown_timer(unsigned int cpu)
{
BUG_ON(cpu == 0);
- unbind_from_irqhandler(per_cpu(timer_irq, cpu), NULL);
+ unbind_from_per_cpu_irq(timer_irq, cpu, &timer_action);
}
#endif
--- sle11sp1-2010-03-22.orig/drivers/xen/core/evtchn.c 2010-02-09 17:18:51.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/evtchn.c 2010-02-09 17:18:55.000000000 +0100
@@ -58,6 +58,23 @@ static DEFINE_SPINLOCK(irq_mapping_updat
static int evtchn_to_irq[NR_EVENT_CHANNELS] = {
[0 ... NR_EVENT_CHANNELS-1] = -1 };
+#if defined(CONFIG_SMP) && defined(CONFIG_X86)
+static struct per_cpu_irqaction {
+ struct irqaction action; /* must be first */
+ struct per_cpu_irqaction *next;
+ cpumask_t cpus;
+} *virq_actions[NR_VIRQS];
+/* IRQ <-> VIRQ mapping. */
+static DECLARE_BITMAP(virq_per_cpu, NR_VIRQS) __read_mostly;
+static DEFINE_PER_CPU(int[NR_VIRQS], virq_to_evtchn);
+#define BUG_IF_VIRQ_PER_CPU(irq) \
+ BUG_ON(type_from_irq(irq) == IRQT_VIRQ \
+ && test_bit(index_from_irq(irq), virq_per_cpu))
+#else
+#define BUG_IF_VIRQ_PER_CPU(irq) ((void)(irq))
+#define PER_CPU_VIRQ_IRQ
+#endif
+
/* IRQ <-> IPI mapping. */
#ifndef NR_IPIS
#define NR_IPIS 1
@@ -132,15 +149,6 @@ static inline u32 mk_irq_info(u32 type,
* Accessors for packed IRQ information.
*/
-#ifdef PER_CPU_IPI_IRQ
-static inline unsigned int evtchn_from_irq(int irq)
-{
- const struct irq_cfg *cfg = irq_cfg(irq);
-
- return cfg ? cfg->info & ((1U << _EVTCHN_BITS) - 1) : 0;
-}
-#endif
-
static inline unsigned int index_from_irq(int irq)
{
const struct irq_cfg *cfg = irq_cfg(irq);
@@ -156,24 +164,39 @@ static inline unsigned int type_from_irq
return cfg ? cfg->info >> (32 - _IRQT_BITS) : IRQT_UNBOUND;
}
-#ifndef PER_CPU_IPI_IRQ
static inline unsigned int evtchn_from_per_cpu_irq(unsigned int irq,
unsigned int cpu)
{
- BUG_ON(type_from_irq(irq) != IRQT_IPI);
- return per_cpu(ipi_to_evtchn, cpu)[index_from_irq(irq)];
+ switch (type_from_irq(irq)) {
+#ifndef PER_CPU_VIRQ_IRQ
+ case IRQT_VIRQ:
+ return per_cpu(virq_to_evtchn, cpu)[index_from_irq(irq)];
+#endif
+#ifndef PER_CPU_IPI_IRQ
+ case IRQT_IPI:
+ return per_cpu(ipi_to_evtchn, cpu)[index_from_irq(irq)];
+#endif
+ }
+ BUG();
+ return 0;
}
static inline unsigned int evtchn_from_irq(unsigned int irq)
{
- if (type_from_irq(irq) != IRQT_IPI) {
- const struct irq_cfg *cfg = irq_cfg(irq);
+ const struct irq_cfg *cfg;
- return cfg ? cfg->info & ((1U << _EVTCHN_BITS) - 1) : 0;
+ switch (type_from_irq(irq)) {
+#ifndef PER_CPU_VIRQ_IRQ
+ case IRQT_VIRQ:
+#endif
+#ifndef PER_CPU_IPI_IRQ
+ case IRQT_IPI:
+#endif
+ return evtchn_from_per_cpu_irq(irq, smp_processor_id());
}
- return evtchn_from_per_cpu_irq(irq, smp_processor_id());
+ cfg = irq_cfg(irq);
+ return cfg ? cfg->info & ((1U << _EVTCHN_BITS) - 1) : 0;
}
-#endif
/* IRQ <-> VIRQ mapping. */
DEFINE_PER_CPU(int[NR_VIRQS], virq_to_irq) = {[0 ... NR_VIRQS-1] = -1};
@@ -516,6 +539,14 @@ static int bind_virq_to_irq(unsigned int
evtchn = bind_virq.port;
evtchn_to_irq[evtchn] = irq;
+#ifndef PER_CPU_VIRQ_IRQ
+ {
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu)
+ per_cpu(virq_to_evtchn, cpu)[virq] = evtchn;
+ }
+#endif
irq_cfg(irq)->info = mk_irq_info(IRQT_VIRQ, virq, evtchn);
per_cpu(virq_to_irq, cpu)[virq] = irq;
@@ -570,7 +601,9 @@ static void unbind_from_irq(unsigned int
unsigned int cpu;
int evtchn = evtchn_from_irq(irq);
+ BUG_IF_VIRQ_PER_CPU(irq);
BUG_IF_IPI(irq);
+
spin_lock(&irq_mapping_update_lock);
if (!--irq_cfg(irq)->bindcount && VALID_EVTCHN(evtchn)) {
@@ -583,6 +616,11 @@ static void unbind_from_irq(unsigned int
case IRQT_VIRQ:
per_cpu(virq_to_irq, cpu_from_evtchn(evtchn))
[index_from_irq(irq)] = -1;
+#ifndef PER_CPU_VIRQ_IRQ
+ for_each_possible_cpu(cpu)
+ per_cpu(virq_to_evtchn, cpu)
+ [index_from_irq(irq)] = 0;
+#endif
break;
#if defined(CONFIG_SMP) && defined(PER_CPU_IPI_IRQ)
case IRQT_IPI:
@@ -612,11 +650,13 @@ static void unbind_from_irq(unsigned int
spin_unlock(&irq_mapping_update_lock);
}
-#if defined(CONFIG_SMP) && !defined(PER_CPU_IPI_IRQ)
-void unbind_from_per_cpu_irq(unsigned int irq, unsigned int cpu)
+#if defined(CONFIG_SMP) && (!defined(PER_CPU_IPI_IRQ) || !defined(PER_CPU_VIRQ_IRQ))
+void unbind_from_per_cpu_irq(unsigned int irq, unsigned int cpu,
+ struct irqaction *action)
{
struct evtchn_close close;
int evtchn = evtchn_from_per_cpu_irq(irq, cpu);
+ struct irqaction *free_action = NULL;
spin_lock(&irq_mapping_update_lock);
@@ -627,6 +667,32 @@ void unbind_from_per_cpu_irq(unsigned in
BUG_ON(irq_cfg(irq)->bindcount <= 1);
irq_cfg(irq)->bindcount--;
+
+#ifndef PER_CPU_VIRQ_IRQ
+ if (type_from_irq(irq) == IRQT_VIRQ) {
+ unsigned int virq = index_from_irq(irq);
+ struct per_cpu_irqaction *cur, *prev = NULL;
+
+ cur = virq_actions[virq];
+ while (cur) {
+ if (cur->action.dev_id == action) {
+ cpu_clear(cpu, cur->cpus);
+ if (cpus_empty(cur->cpus)) {
+ if (prev)
+ prev->next = cur->next;
+ else
+ virq_actions[virq] = cur->next;
+ free_action = action;
+ }
+ } else if (cpu_isset(cpu, cur->cpus))
+ evtchn = 0;
+ cur = (prev = cur)->next;
+ }
+ if (!VALID_EVTCHN(evtchn))
+ goto done;
+ }
+#endif
+
cpumask_clear_cpu(cpu, desc->affinity);
close.port = evtchn;
@@ -634,9 +700,16 @@ void unbind_from_per_cpu_irq(unsigned in
BUG();
switch (type_from_irq(irq)) {
+#ifndef PER_CPU_VIRQ_IRQ
+ case IRQT_VIRQ:
+ per_cpu(virq_to_evtchn, cpu)[index_from_irq(irq)] = 0;
+ break;
+#endif
+#ifndef PER_CPU_IPI_IRQ
case IRQT_IPI:
per_cpu(ipi_to_evtchn, cpu)[index_from_irq(irq)] = 0;
break;
+#endif
default:
BUG();
break;
@@ -648,9 +721,16 @@ void unbind_from_per_cpu_irq(unsigned in
evtchn_to_irq[evtchn] = -1;
}
+#ifndef PER_CPU_VIRQ_IRQ
+done:
+#endif
spin_unlock(&irq_mapping_update_lock);
+
+ if (free_action)
+ free_irq(irq, free_action);
}
-#endif /* CONFIG_SMP && !PER_CPU_IPI_IRQ */
+EXPORT_SYMBOL_GPL(unbind_from_per_cpu_irq);
+#endif /* CONFIG_SMP && (!PER_CPU_IPI_IRQ || !PER_CPU_VIRQ_IRQ) */
int bind_caller_port_to_irqhandler(
unsigned int caller_port,
@@ -732,6 +812,8 @@ int bind_virq_to_irqhandler(
{
int irq, retval;
+ BUG_IF_VIRQ_PER_CPU(virq);
+
irq = bind_virq_to_irq(virq, cpu);
if (irq < 0)
return irq;
@@ -747,6 +829,108 @@ int bind_virq_to_irqhandler(
EXPORT_SYMBOL_GPL(bind_virq_to_irqhandler);
#ifdef CONFIG_SMP
+#ifndef PER_CPU_VIRQ_IRQ
+int bind_virq_to_irqaction(
+ unsigned int virq,
+ unsigned int cpu,
+ struct irqaction *action)
+{
+ struct evtchn_bind_virq bind_virq;
+ int evtchn, irq, retval = 0;
+ struct per_cpu_irqaction *cur = NULL, *new;
+
+ BUG_ON(!test_bit(virq, virq_per_cpu));
+
+ if (action->dev_id)
+ return -EINVAL;
+
+ new = kzalloc(sizeof(*new), GFP_ATOMIC);
+ if (new) {
+ new->action = *action;
+ new->action.dev_id = action;
+ }
+
+ spin_lock(&irq_mapping_update_lock);
+
+ for (cur = virq_actions[virq]; cur; cur = cur->next)
+ if (cur->action.dev_id == action)
+ break;
+ if (!cur) {
+ if (!new) {
+ spin_unlock(&irq_mapping_update_lock);
+ return -ENOMEM;
+ }
+ new->next = virq_actions[virq];
+ virq_actions[virq] = cur = new;
+ retval = 1;
+ }
+ cpu_set(cpu, cur->cpus);
+ action = &cur->action;
+
+ if ((irq = per_cpu(virq_to_irq, cpu)[virq]) == -1) {
+ unsigned int nr;
+
+ BUG_ON(!retval);
+
+ if ((irq = find_unbound_irq(cpu, true)) < 0) {
+ if (cur)
+ virq_actions[virq] = cur->next;
+ spin_unlock(&irq_mapping_update_lock);
+ if (cur != new)
+ kfree(new);
+ return irq;
+ }
+
+ /* Extra reference so count will never drop to zero. */
+ irq_cfg(irq)->bindcount++;
+
+ for_each_possible_cpu(nr)
+ per_cpu(virq_to_irq, nr)[virq] = irq;
+ irq_cfg(irq)->info = mk_irq_info(IRQT_VIRQ, virq, 0);
+ }
+
+ evtchn = per_cpu(virq_to_evtchn, cpu)[virq];
+ if (!VALID_EVTCHN(evtchn)) {
+ bind_virq.virq = virq;
+ bind_virq.vcpu = cpu;
+ if (HYPERVISOR_event_channel_op(EVTCHNOP_bind_virq,
+ &bind_virq) != 0)
+ BUG();
+ evtchn = bind_virq.port;
+ evtchn_to_irq[evtchn] = irq;
+ per_cpu(virq_to_evtchn, cpu)[virq] = evtchn;
+
+ bind_evtchn_to_cpu(evtchn, cpu);
+ }
+
+ irq_cfg(irq)->bindcount++;
+
+ spin_unlock(&irq_mapping_update_lock);
+
+ if (cur != new)
+ kfree(new);
+
+ if (retval == 0) {
+ unsigned long flags;
+
+ local_irq_save(flags);
+ unmask_evtchn(evtchn);
+ local_irq_restore(flags);
+ } else {
+ action->flags |= IRQF_PERCPU;
+ retval = setup_irq(irq, action);
+ if (retval) {
+ unbind_from_per_cpu_irq(irq, cpu, cur->action.dev_id);
+ BUG_ON(retval > 0);
+ irq = retval;
+ }
+ }
+
+ return irq;
+}
+EXPORT_SYMBOL_GPL(bind_virq_to_irqaction);
+#endif
+
#ifdef PER_CPU_IPI_IRQ
int bind_ipi_to_irqhandler(
unsigned int ipi,
@@ -826,7 +1010,7 @@ int __cpuinit bind_ipi_to_irqaction(
action->flags |= IRQF_PERCPU | IRQF_NO_SUSPEND;
retval = setup_irq(irq, action);
if (retval) {
- unbind_from_per_cpu_irq(irq, cpu);
+ unbind_from_per_cpu_irq(irq, cpu, NULL);
BUG_ON(retval > 0);
irq = retval;
}
@@ -861,7 +1045,9 @@ static void rebind_irq_to_cpu(unsigned i
{
int evtchn = evtchn_from_irq(irq);
+ BUG_IF_VIRQ_PER_CPU(irq);
BUG_IF_IPI(irq);
+
if (VALID_EVTCHN(evtchn))
rebind_evtchn_to_cpu(evtchn, tcpu);
}
@@ -1141,7 +1327,9 @@ void notify_remote_via_irq(int irq)
{
int evtchn = evtchn_from_irq(irq);
+ BUG_ON(type_from_irq(irq) == IRQT_VIRQ);
BUG_IF_IPI(irq);
+
if (VALID_EVTCHN(evtchn))
notify_remote_via_evtchn(evtchn);
}
@@ -1149,6 +1337,7 @@ EXPORT_SYMBOL_GPL(notify_remote_via_irq)
int irq_to_evtchn_port(int irq)
{
+ BUG_IF_VIRQ_PER_CPU(irq);
BUG_IF_IPI(irq);
return evtchn_from_irq(irq);
}
@@ -1243,6 +1432,12 @@ static void restore_cpu_virqs(unsigned i
if ((irq = per_cpu(virq_to_irq, cpu)[virq]) == -1)
continue;
+#ifndef PER_CPU_VIRQ_IRQ
+ if (test_bit(virq, virq_per_cpu)
+ && !VALID_EVTCHN(per_cpu(virq_to_evtchn, cpu)[virq]))
+ continue;
+#endif
+
BUG_ON(irq_cfg(irq)->info != mk_irq_info(IRQT_VIRQ, virq, 0));
/* Get a new binding from Xen. */
@@ -1255,7 +1450,20 @@ static void restore_cpu_virqs(unsigned i
/* Record the new mapping. */
evtchn_to_irq[evtchn] = irq;
+#ifdef PER_CPU_VIRQ_IRQ
irq_cfg(irq)->info = mk_irq_info(IRQT_VIRQ, virq, evtchn);
+#else
+ if (test_bit(virq, virq_per_cpu))
+ per_cpu(virq_to_evtchn, cpu)[virq] = evtchn;
+ else {
+ unsigned int cpu;
+
+ irq_cfg(irq)->info = mk_irq_info(IRQT_VIRQ, virq,
+ evtchn);
+ for_each_possible_cpu(cpu)
+ per_cpu(virq_to_evtchn, cpu)[virq] = evtchn;
+ }
+#endif
bind_evtchn_to_cpu(evtchn, cpu);
/* Ready for use. */
@@ -1311,7 +1519,11 @@ static int evtchn_resume(struct sys_devi
/* Avoid doing anything in the 'suspend cancelled' case. */
status.dom = DOMID_SELF;
+#ifdef PER_CPU_VIRQ_IRQ
status.port = evtchn_from_irq(percpu_read(virq_to_irq[VIRQ_TIMER]));
+#else
+ status.port = percpu_read(virq_to_evtchn[VIRQ_TIMER]);
+#endif
if (HYPERVISOR_event_channel_op(EVTCHNOP_status, &status))
BUG();
if (status.status == EVTCHNSTAT_virq
@@ -1540,6 +1752,15 @@ void __init xen_init_IRQ(void)
unsigned int i;
struct physdev_pirq_eoi_gmfn eoi_gmfn;
+#ifndef PER_CPU_VIRQ_IRQ
+ __set_bit(VIRQ_TIMER, virq_per_cpu);
+ __set_bit(VIRQ_DEBUG, virq_per_cpu);
+ __set_bit(VIRQ_XENOPROF, virq_per_cpu);
+#ifdef CONFIG_IA64
+ __set_bit(VIRQ_ITC, virq_per_cpu);
+#endif
+#endif
+
init_evtchn_cpu_bindings();
i = get_order(sizeof(unsigned long) * BITS_TO_LONGS(nr_pirqs));
--- sle11sp1-2010-03-22.orig/drivers/xen/core/smpboot.c 2010-03-22 12:57:46.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/smpboot.c 2010-03-22 12:57:50.000000000 +0100
@@ -176,13 +176,13 @@ static int __cpuinit xen_smp_intr_init(u
fail:
xen_spinlock_cleanup(cpu);
unbind_reboot:
- unbind_from_per_cpu_irq(reboot_irq, cpu);
+ unbind_from_per_cpu_irq(reboot_irq, cpu, NULL);
unbind_call1:
- unbind_from_per_cpu_irq(call1func_irq, cpu);
+ unbind_from_per_cpu_irq(call1func_irq, cpu, NULL);
unbind_call:
- unbind_from_per_cpu_irq(callfunc_irq, cpu);
+ unbind_from_per_cpu_irq(callfunc_irq, cpu, NULL);
unbind_resched:
- unbind_from_per_cpu_irq(resched_irq, cpu);
+ unbind_from_per_cpu_irq(resched_irq, cpu, NULL);
return rc;
}
@@ -192,10 +192,10 @@ static void __cpuinit xen_smp_intr_exit(
if (cpu != 0)
local_teardown_timer(cpu);
- unbind_from_per_cpu_irq(resched_irq, cpu);
- unbind_from_per_cpu_irq(callfunc_irq, cpu);
- unbind_from_per_cpu_irq(call1func_irq, cpu);
- unbind_from_per_cpu_irq(reboot_irq, cpu);
+ unbind_from_per_cpu_irq(resched_irq, cpu, NULL);
+ unbind_from_per_cpu_irq(callfunc_irq, cpu, NULL);
+ unbind_from_per_cpu_irq(call1func_irq, cpu, NULL);
+ unbind_from_per_cpu_irq(reboot_irq, cpu, NULL);
xen_spinlock_cleanup(cpu);
}
#endif
--- sle11sp1-2010-03-22.orig/drivers/xen/core/spinlock.c 2010-02-23 14:25:31.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/spinlock.c 2010-02-23 12:31:40.000000000 +0100
@@ -55,7 +55,7 @@ int __cpuinit xen_spinlock_init(unsigned
void __cpuinit xen_spinlock_cleanup(unsigned int cpu)
{
- unbind_from_per_cpu_irq(spinlock_irq, cpu);
+ unbind_from_per_cpu_irq(spinlock_irq, cpu, NULL);
}
static unsigned int spin_adjust(struct spinning *spinning,
--- sle11sp1-2010-03-22.orig/drivers/xen/netback/netback.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/netback/netback.c 2010-01-04 13:31:26.000000000 +0100
@@ -1619,6 +1619,12 @@ static irqreturn_t netif_be_dbg(int irq,
return IRQ_HANDLED;
}
+
+static struct irqaction netif_be_dbg_action = {
+ .handler = netif_be_dbg,
+ .flags = IRQF_SHARED,
+ .name = "net-be-dbg"
+};
#endif
static int __init netback_init(void)
@@ -1678,12 +1684,9 @@ static int __init netback_init(void)
netif_xenbus_init();
#ifdef NETBE_DEBUG_INTERRUPT
- (void)bind_virq_to_irqhandler(VIRQ_DEBUG,
- 0,
- netif_be_dbg,
- IRQF_SHARED,
- "net-be-dbg",
- &netif_be_dbg);
+ (void)bind_virq_to_irqaction(VIRQ_DEBUG,
+ 0,
+ &netif_be_dbg_action);
#endif
return 0;
--- sle11sp1-2010-03-22.orig/drivers/xen/xenoprof/xenoprofile.c 2010-01-07 09:59:32.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/xenoprof/xenoprofile.c 2010-01-07 11:04:10.000000000 +0100
@@ -210,6 +210,11 @@ static irqreturn_t xenoprof_ovf_interrup
return IRQ_HANDLED;
}
+static struct irqaction ovf_action = {
+ .handler = xenoprof_ovf_interrupt,
+ .flags = IRQF_DISABLED,
+ .name = "xenoprof"
+};
static void unbind_virq(void)
{
@@ -217,7 +222,7 @@ static void unbind_virq(void)
for_each_online_cpu(i) {
if (ovf_irq[i] >= 0) {
- unbind_from_irqhandler(ovf_irq[i], NULL);
+ unbind_from_per_cpu_irq(ovf_irq[i], i, &ovf_action);
ovf_irq[i] = -1;
}
}
@@ -230,12 +235,7 @@ static int bind_virq(void)
int result;
for_each_online_cpu(i) {
- result = bind_virq_to_irqhandler(VIRQ_XENOPROF,
- i,
- xenoprof_ovf_interrupt,
- IRQF_DISABLED|IRQF_NOBALANCING,
- "xenoprof",
- NULL);
+ result = bind_virq_to_irqaction(VIRQ_XENOPROF, i, &ovf_action);
if (result < 0) {
unbind_virq();
--- sle11sp1-2010-03-22.orig/include/xen/evtchn.h 2009-12-18 10:13:26.000000000 +0100
+++ sle11sp1-2010-03-22/include/xen/evtchn.h 2009-12-18 10:13:32.000000000 +0100
@@ -92,6 +92,17 @@ int bind_virq_to_irqhandler(
unsigned long irqflags,
const char *devname,
void *dev_id);
+#if defined(CONFIG_SMP) && defined(CONFIG_XEN) && defined(CONFIG_X86)
+int bind_virq_to_irqaction(
+ unsigned int virq,
+ unsigned int cpu,
+ struct irqaction *action);
+#else
+#define bind_virq_to_irqaction(virq, cpu, action) \
+ bind_virq_to_irqhandler(virq, cpu, (action)->handler, \
+ (action)->flags | IRQF_NOBALANCING, \
+ (action)->name, action)
+#endif
#if defined(CONFIG_SMP) && !defined(MODULE)
#ifndef CONFIG_X86
int bind_ipi_to_irqhandler(
@@ -116,9 +127,13 @@ int bind_ipi_to_irqaction(
*/
void unbind_from_irqhandler(unsigned int irq, void *dev_id);
-#if defined(CONFIG_SMP) && !defined(MODULE) && defined(CONFIG_X86)
+#if defined(CONFIG_SMP) && defined(CONFIG_XEN) && defined(CONFIG_X86)
/* Specialized unbind function for per-CPU IRQs. */
-void unbind_from_per_cpu_irq(unsigned int irq, unsigned int cpu);
+void unbind_from_per_cpu_irq(unsigned int irq, unsigned int cpu,
+ struct irqaction *);
+#else
+#define unbind_from_per_cpu_irq(irq, cpu, action) \
+ unbind_from_irqhandler(irq, action)
#endif
#ifndef CONFIG_XEN

143
xen-x86-bigmem Normal file
View file

@ -0,0 +1,143 @@
From: jbeulich@novell.com
Subject: fix issues with the assignment of huge amounts of memory
Patch-mainline: obsolete
References: bnc#482614, bnc#537435
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/e820-xen.c 2009-12-04 11:31:40.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/e820-xen.c 2009-12-04 12:11:12.000000000 +0100
@@ -1350,6 +1350,26 @@ static int __init parse_memopt(char *p)
userdef = 1;
mem_size = memparse(p, &p);
+#ifdef CONFIG_XEN
+ /*
+ * A little less than 2% of available memory are needed for page
+ * tables, p2m map, and mem_map. Hence the maximum amount of memory
+ * we can potentially balloon up to can in no case exceed about 50
+ * times of what we've been given initially. Since even with that we
+ * won't be able to boot (due to various calculations done based on
+ * the total number of pages) we further restrict this to factor 32.
+ */
+ if ((mem_size >> (PAGE_SHIFT + 5)) > xen_start_info->nr_pages) {
+ u64 size = (u64)xen_start_info->nr_pages << 5;
+
+ printk(KERN_WARNING "mem=%Luk is invalid for an initial"
+ " allocation of %luk, using %Luk\n",
+ (unsigned long long)mem_size >> 10,
+ xen_start_info->nr_pages << (PAGE_SHIFT - 10),
+ (unsigned long long)size << (PAGE_SHIFT - 10));
+ mem_size = size << PAGE_SHIFT;
+ }
+#endif
e820_remove_range(mem_size, ULLONG_MAX - mem_size, E820_RAM, 1);
i = e820.nr_map - 1;
@@ -1546,6 +1566,7 @@ void __init e820_reserve_resources_late(
char *__init default_machine_specific_memory_setup(void)
{
int rc, nr_map;
+ unsigned long long maxmem;
struct xen_memory_map memmap;
static struct e820entry __initdata map[E820MAX];
@@ -1571,6 +1592,22 @@ char *__init default_machine_specific_me
BUG();
#ifdef CONFIG_XEN
+ /* See the comment in parse_memopt(). */
+ for (maxmem = rc = 0; rc < e820.nr_map; ++rc)
+ if (e820.map[rc].type == E820_RAM)
+ maxmem += e820.map[rc].size;
+ if ((maxmem >> (PAGE_SHIFT + 5)) > xen_start_info->nr_pages) {
+ unsigned long long size = (u64)xen_start_info->nr_pages << 5;
+
+ printk(KERN_WARNING "maxmem of %LuM is invalid for an initial"
+ " allocation of %luM, using %LuM\n",
+ maxmem >> 20,
+ xen_start_info->nr_pages >> (20 - PAGE_SHIFT),
+ size >> (20 - PAGE_SHIFT));
+ size <<= PAGE_SHIFT;
+ e820_remove_range(size, ULLONG_MAX - size, E820_RAM, 1);
+ }
+
if (is_initial_xendomain()) {
memmap.nr_entries = E820MAX;
set_xen_guest_handle(memmap.buffer, machine_e820.map);
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/setup-xen.c 2010-02-09 17:19:30.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/setup-xen.c 2010-02-09 17:19:48.000000000 +0100
@@ -129,12 +129,7 @@ static struct notifier_block xen_panic_b
unsigned long *phys_to_machine_mapping;
EXPORT_SYMBOL(phys_to_machine_mapping);
-unsigned long *pfn_to_mfn_frame_list_list,
-#ifdef CONFIG_X86_64
- *pfn_to_mfn_frame_list[512];
-#else
- *pfn_to_mfn_frame_list[128];
-#endif
+unsigned long *pfn_to_mfn_frame_list_list, **pfn_to_mfn_frame_list;
/* Raw start-of-day parameters from the hypervisor. */
start_info_t *xen_start_info;
@@ -1153,17 +1148,17 @@ void __init setup_arch(char **cmdline_p)
p2m_pages = xen_start_info->nr_pages;
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
- unsigned long i, j;
+ unsigned long i, j, size;
unsigned int k, fpp;
/* Make sure we have a large enough P->M table. */
phys_to_machine_mapping = alloc_bootmem_pages(
max_pfn * sizeof(unsigned long));
- memset(phys_to_machine_mapping, ~0,
- max_pfn * sizeof(unsigned long));
memcpy(phys_to_machine_mapping,
(unsigned long *)xen_start_info->mfn_list,
p2m_pages * sizeof(unsigned long));
+ memset(phys_to_machine_mapping + p2m_pages, ~0,
+ (max_pfn - p2m_pages) * sizeof(unsigned long));
free_bootmem(
__pa(xen_start_info->mfn_list),
PFN_PHYS(PFN_UP(xen_start_info->nr_pages *
@@ -1173,15 +1168,26 @@ void __init setup_arch(char **cmdline_p)
* Initialise the list of the frames that specify the list of
* frames that make up the p2m table. Used by save/restore.
*/
- pfn_to_mfn_frame_list_list = alloc_bootmem_pages(PAGE_SIZE);
-
fpp = PAGE_SIZE/sizeof(unsigned long);
+ size = (max_pfn + fpp - 1) / fpp;
+ size = (size + fpp - 1) / fpp;
+ ++size; /* include a zero terminator for crash tools */
+ size *= sizeof(unsigned long);
+ pfn_to_mfn_frame_list_list = alloc_bootmem_pages(size);
+ if (size > PAGE_SIZE
+ && xen_create_contiguous_region((unsigned long)
+ pfn_to_mfn_frame_list_list,
+ get_order(size), 0))
+ BUG();
+ size -= sizeof(unsigned long);
+ pfn_to_mfn_frame_list = alloc_bootmem(size);
+
for (i = j = 0, k = -1; i < max_pfn; i += fpp, j++) {
if (j == fpp)
j = 0;
if (j == 0) {
k++;
- BUG_ON(k>=ARRAY_SIZE(pfn_to_mfn_frame_list));
+ BUG_ON(k * sizeof(unsigned long) >= size);
pfn_to_mfn_frame_list[k] =
alloc_bootmem_pages(PAGE_SIZE);
pfn_to_mfn_frame_list_list[k] =
--- sle11sp1-2010-02-09.orig/drivers/xen/core/machine_reboot.c 2009-12-18 14:19:13.000000000 +0100
+++ sle11sp1-2010-02-09/drivers/xen/core/machine_reboot.c 2009-12-18 14:15:04.000000000 +0100
@@ -79,7 +79,7 @@ static void post_suspend(int suspend_can
unsigned long shinfo_mfn;
extern unsigned long max_pfn;
extern unsigned long *pfn_to_mfn_frame_list_list;
- extern unsigned long *pfn_to_mfn_frame_list[];
+ extern unsigned long **pfn_to_mfn_frame_list;
if (suspend_cancelled) {
xen_start_info->store_mfn =

247
xen-x86-consistent-nmi Normal file
View file

@ -0,0 +1,247 @@
From: jbeulich@novell.com
Subject: make i386 and x86 NMI code consistent, disable all APIC-related stuff
Patch-mainline: obsolete
References: 191115
--- sle11sp1-2010-02-09.orig/arch/x86/include/asm/irq.h 2010-02-09 16:33:59.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/include/asm/irq.h 2009-10-13 17:07:27.000000000 +0200
@@ -15,7 +15,7 @@ static inline int irq_canonicalize(int i
return ((irq == 2) ? 9 : irq);
}
-#ifdef CONFIG_X86_LOCAL_APIC
+#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_XEN)
# define ARCH_HAS_NMI_WATCHDOG
#endif
--- sle11sp1-2010-02-09.orig/arch/x86/include/asm/nmi.h 2010-02-09 16:33:59.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/include/asm/nmi.h 2009-10-13 17:07:27.000000000 +0200
@@ -5,8 +5,6 @@
#include <asm/irq.h>
#include <asm/io.h>
-#ifdef ARCH_HAS_NMI_WATCHDOG
-
/**
* do_nmi_callback
*
@@ -16,6 +14,11 @@
int do_nmi_callback(struct pt_regs *regs, int cpu);
extern void die_nmi(char *str, struct pt_regs *regs, int do_panic);
+
+extern int unknown_nmi_panic;
+
+#ifdef ARCH_HAS_NMI_WATCHDOG
+
extern int check_nmi_watchdog(void);
extern int nmi_watchdog_enabled;
extern int avail_to_resrv_perfctr_nmi_bit(unsigned int);
@@ -42,7 +45,6 @@ extern unsigned int nmi_watchdog;
struct ctl_table;
extern int proc_nmi_enabled(struct ctl_table *, int ,
void __user *, size_t *, loff_t *);
-extern int unknown_nmi_panic;
void arch_trigger_all_cpu_backtrace(void);
#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
@@ -65,7 +67,6 @@ static inline int nmi_watchdog_active(vo
*/
return nmi_watchdog & (NMI_LOCAL_APIC | NMI_IO_APIC);
}
-#endif
void lapic_watchdog_stop(void);
int lapic_watchdog_init(unsigned nmi_hz);
@@ -73,6 +74,9 @@ int lapic_wd_event(unsigned nmi_hz);
unsigned lapic_adjust_nmi_hz(unsigned hz);
void disable_lapic_nmi_watchdog(void);
void enable_lapic_nmi_watchdog(void);
+
+#endif
+
void stop_nmi(void);
void restart_nmi(void);
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/apic/Makefile 2009-11-06 10:52:02.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/apic/Makefile 2009-10-13 17:07:27.000000000 +0200
@@ -18,8 +18,6 @@ obj-$(CONFIG_X86_NUMAQ) += numaq_32.o
obj-$(CONFIG_X86_ES7000) += es7000_32.o
obj-$(CONFIG_X86_SUMMIT) += summit_32.o
-obj-$(CONFIG_XEN) += nmi.o
-
probe_64-$(CONFIG_XEN) := probe_32.o
disabled-obj-$(CONFIG_XEN) := apic_flat_$(BITS).o
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/apic/nmi.c 2009-11-06 10:51:42.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/apic/nmi.c 2009-10-13 17:18:34.000000000 +0200
@@ -27,8 +27,10 @@
#include <linux/kdebug.h>
#include <linux/smp.h>
-#ifndef CONFIG_XEN
+#ifdef ARCH_HAS_NMI_WATCHDOG
#include <asm/i8259.h>
+#else
+#include <asm/nmi.h>
#endif
#include <asm/io_apic.h>
#include <asm/proto.h>
@@ -39,6 +41,9 @@
#include <asm/mach_traps.h>
int unknown_nmi_panic;
+
+#ifdef ARCH_HAS_NMI_WATCHDOG
+
int nmi_watchdog_enabled;
static cpumask_t backtrace_mask __read_mostly;
@@ -176,13 +181,11 @@ int __init check_nmi_watchdog(void)
kfree(prev_nmi_count);
return 0;
error:
-#ifndef CONFIG_XEN
if (nmi_watchdog == NMI_IO_APIC) {
if (!timer_through_8259)
disable_8259A_irq(0);
on_each_cpu(__acpi_nmi_disable, NULL, 1);
}
-#endif
#ifdef CONFIG_X86_32
timer_ack = 0;
@@ -472,8 +475,11 @@ nmi_watchdog_tick(struct pt_regs *regs,
return rc;
}
+#endif /* ARCH_HAS_NMI_WATCHDOG */
+
#ifdef CONFIG_SYSCTL
+#ifdef ARCH_HAS_NMI_WATCHDOG
static void enable_ioapic_nmi_watchdog_single(void *unused)
{
__get_cpu_var(wd_enabled) = 1;
@@ -491,6 +497,7 @@ static void disable_ioapic_nmi_watchdog(
{
on_each_cpu(stop_apic_nmi_watchdog, NULL, 1);
}
+#endif
static int __init setup_unknown_nmi_panic(char *str)
{
@@ -509,6 +516,7 @@ static int unknown_nmi_panic_callback(st
return 0;
}
+#ifdef ARCH_HAS_NMI_WATCHDOG
/*
* proc handler for /proc/sys/kernel/nmi
*/
@@ -546,6 +554,7 @@ int proc_nmi_enabled(struct ctl_table *t
}
return 0;
}
+#endif
#endif /* CONFIG_SYSCTL */
@@ -558,6 +567,7 @@ int do_nmi_callback(struct pt_regs *regs
return 0;
}
+#ifdef ARCH_HAS_NMI_WATCHDOG
void arch_trigger_all_cpu_backtrace(void)
{
int i;
@@ -574,3 +584,4 @@ void arch_trigger_all_cpu_backtrace(void
mdelay(1);
}
}
+#endif
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/cpu/Makefile 2010-02-09 17:07:42.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/cpu/Makefile 2010-02-09 17:19:39.000000000 +0100
@@ -33,7 +33,7 @@ obj-$(CONFIG_CPU_FREQ) += cpufreq/
obj-$(CONFIG_X86_LOCAL_APIC) += perfctr-watchdog.o
-disabled-obj-$(CONFIG_XEN) := hypervisor.o vmware.o sched.o
+disabled-obj-$(CONFIG_XEN) := hypervisor.o vmware.o sched.o perfctr-watchdog.o
quiet_cmd_mkcapflags = MKCAP $@
cmd_mkcapflags = $(PERL) $(srctree)/$(src)/mkcapflags.pl $< $@
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/head-xen.c 2009-11-06 10:52:22.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/head-xen.c 2009-10-15 15:32:46.000000000 +0200
@@ -179,12 +179,10 @@ void __init xen_arch_setup(void)
.address = CALLBACK_ADDR(system_call)
};
#endif
-#if defined(CONFIG_X86_LOCAL_APIC) || defined(CONFIG_X86_32)
static const struct callback_register __initconst nmi_cb = {
.type = CALLBACKTYPE_nmi,
.address = CALLBACK_ADDR(nmi)
};
-#endif
ret = HYPERVISOR_callback_op(CALLBACKOP_register, &event);
if (ret == 0)
@@ -208,7 +206,6 @@ void __init xen_arch_setup(void)
#endif
BUG_ON(ret);
-#if defined(CONFIG_X86_LOCAL_APIC) || defined(CONFIG_X86_32)
ret = HYPERVISOR_callback_op(CALLBACKOP_register, &nmi_cb);
#if CONFIG_XEN_COMPAT <= 0x030002
if (ret == -ENOSYS) {
@@ -219,6 +216,5 @@ void __init xen_arch_setup(void)
HYPERVISOR_nmi_op(XENNMI_register_callback, &cb);
}
#endif
-#endif
}
#endif /* CONFIG_XEN */
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/traps-xen.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/traps-xen.c 2009-10-14 17:26:48.000000000 +0200
@@ -51,6 +51,7 @@
#include <asm/atomic.h>
#include <asm/system.h>
#include <asm/traps.h>
+#include <asm/nmi.h>
#include <asm/desc.h>
#include <asm/i387.h>
#include <asm/mce.h>
@@ -394,12 +395,14 @@ static notrace __kprobes void default_do
== NOTIFY_STOP)
return;
#ifdef CONFIG_X86_LOCAL_APIC
+#ifdef ARCH_HAS_NMI_WATCHDOG
/*
* Ok, so this is none of the documented NMI sources,
* so it must be the NMI watchdog.
*/
if (nmi_watchdog_tick(regs, reason))
return;
+#endif
if (!do_nmi_callback(regs, cpu))
unknown_nmi_error(reason, regs);
#else
--- sle11sp1-2010-02-09.orig/kernel/sysctl.c 2009-12-16 11:47:57.000000000 +0100
+++ sle11sp1-2010-02-09/kernel/sysctl.c 2009-12-16 12:15:35.000000000 +0100
@@ -790,6 +790,7 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = &proc_dointvec,
},
+#ifdef ARCH_HAS_NMI_WATCHDOG
{
.procname = "nmi_watchdog",
.data = &nmi_watchdog_enabled,
@@ -798,6 +799,7 @@ static struct ctl_table kern_table[] = {
.proc_handler = &proc_nmi_enabled,
},
#endif
+#endif
#if defined(CONFIG_X86)
{
.ctl_name = KERN_PANIC_ON_NMI,

168
xen-x86-dcr-fallback Normal file
View file

@ -0,0 +1,168 @@
Subject: Add fallback when XENMEM_exchange fails to replace contiguous region
From: jbeulich@novell.com
Patch-mainline: obsolete
References: 181869
This avoids losing precious special memory in places where any memory can be
used.
--- sle11sp1-2010-03-29.orig/arch/x86/mm/hypervisor.c 2009-11-06 10:52:02.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/mm/hypervisor.c 2009-06-09 15:52:17.000000000 +0200
@@ -43,6 +43,7 @@
#include <xen/interface/memory.h>
#include <linux/module.h>
#include <linux/percpu.h>
+#include <linux/highmem.h>
#include <asm/tlbflush.h>
#include <linux/highmem.h>
@@ -719,6 +720,83 @@ void xen_destroy_contiguous_region(unsig
BUG();
balloon_unlock(flags);
+
+ if (unlikely(!success)) {
+ /* Try hard to get the special memory back to Xen. */
+ exchange.in.extent_order = 0;
+ set_xen_guest_handle(exchange.in.extent_start, &in_frame);
+
+ for (i = 0; i < (1U<<order); i++) {
+ struct page *page = alloc_page(__GFP_HIGHMEM|__GFP_COLD);
+ unsigned long pfn;
+ mmu_update_t mmu;
+ unsigned int j = 0;
+
+ if (!page) {
+ printk(KERN_WARNING "Xen and kernel out of memory "
+ "while trying to release an order %u "
+ "contiguous region\n", order);
+ break;
+ }
+ pfn = page_to_pfn(page);
+
+ balloon_lock(flags);
+
+ if (!PageHighMem(page)) {
+ void *v = __va(pfn << PAGE_SHIFT);
+
+ scrub_pages(v, 1);
+ MULTI_update_va_mapping(cr_mcl + j, (unsigned long)v,
+ __pte_ma(0), UVMF_INVLPG|UVMF_ALL);
+ ++j;
+ }
+#ifdef CONFIG_XEN_SCRUB_PAGES
+ else {
+ scrub_pages(kmap(page), 1);
+ kunmap(page);
+ kmap_flush_unused();
+ }
+#endif
+
+ frame = pfn_to_mfn(pfn);
+ set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
+
+ MULTI_update_va_mapping(cr_mcl + j, vstart,
+ pfn_pte_ma(frame, PAGE_KERNEL),
+ UVMF_INVLPG|UVMF_ALL);
+ ++j;
+
+ pfn = __pa(vstart) >> PAGE_SHIFT;
+ set_phys_to_machine(pfn, frame);
+ if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ mmu.ptr = ((uint64_t)frame << PAGE_SHIFT) | MMU_MACHPHYS_UPDATE;
+ mmu.val = pfn;
+ cr_mcl[j].op = __HYPERVISOR_mmu_update;
+ cr_mcl[j].args[0] = (unsigned long)&mmu;
+ cr_mcl[j].args[1] = 1;
+ cr_mcl[j].args[2] = 0;
+ cr_mcl[j].args[3] = DOMID_SELF;
+ ++j;
+ }
+
+ cr_mcl[j].op = __HYPERVISOR_memory_op;
+ cr_mcl[j].args[0] = XENMEM_decrease_reservation;
+ cr_mcl[j].args[1] = (unsigned long)&exchange.in;
+
+ if (HYPERVISOR_multicall(cr_mcl, j + 1))
+ BUG();
+ BUG_ON(cr_mcl[j].result != 1);
+ while (j--)
+ BUG_ON(cr_mcl[j].result != 0);
+
+ balloon_unlock(flags);
+
+ free_empty_pages(&page, 1);
+
+ in_frame++;
+ vstart += PAGE_SIZE;
+ }
+ }
}
EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
--- sle11sp1-2010-03-29.orig/drivers/xen/balloon/balloon.c 2010-03-31 10:00:17.000000000 +0200
+++ sle11sp1-2010-03-29/drivers/xen/balloon/balloon.c 2010-03-31 10:00:24.000000000 +0200
@@ -776,7 +776,11 @@ struct page **alloc_empty_pages_and_page
}
EXPORT_SYMBOL_GPL(alloc_empty_pages_and_pagevec);
-void free_empty_pages_and_pagevec(struct page **pagevec, int nr_pages)
+#endif /* CONFIG_XEN_BACKEND */
+
+#ifdef CONFIG_XEN
+static void _free_empty_pages_and_pagevec(struct page **pagevec, int nr_pages,
+ bool free_vec)
{
unsigned long flags;
int i;
@@ -787,17 +791,33 @@ void free_empty_pages_and_pagevec(struct
balloon_lock(flags);
for (i = 0; i < nr_pages; i++) {
BUG_ON(page_count(pagevec[i]) != 1);
- balloon_append(pagevec[i], 0);
+ balloon_append(pagevec[i], !free_vec);
+ }
+ if (!free_vec) {
+ bs.current_pages -= nr_pages;
+ totalram_pages = bs.current_pages - totalram_bias;
}
balloon_unlock(flags);
- kfree(pagevec);
+ if (free_vec)
+ kfree(pagevec);
schedule_work(&balloon_worker);
}
-EXPORT_SYMBOL_GPL(free_empty_pages_and_pagevec);
-#endif /* CONFIG_XEN_BACKEND */
+void free_empty_pages(struct page **pagevec, int nr_pages)
+{
+ _free_empty_pages_and_pagevec(pagevec, nr_pages, false);
+}
+#endif
+
+#if defined(CONFIG_XEN_BACKEND) || defined(CONFIG_XEN_BACKEND_MODULE)
+void free_empty_pages_and_pagevec(struct page **pagevec, int nr_pages)
+{
+ _free_empty_pages_and_pagevec(pagevec, nr_pages, true);
+}
+EXPORT_SYMBOL_GPL(free_empty_pages_and_pagevec);
+#endif
void balloon_release_driver_page(struct page *page)
{
--- sle11sp1-2010-03-29.orig/include/xen/balloon.h 2009-11-06 10:51:32.000000000 +0100
+++ sle11sp1-2010-03-29/include/xen/balloon.h 2009-06-09 15:52:17.000000000 +0200
@@ -47,6 +47,10 @@ void balloon_update_driver_allowance(lon
struct page **alloc_empty_pages_and_pagevec(int nr_pages);
void free_empty_pages_and_pagevec(struct page **pagevec, int nr_pages);
+/* Free an empty page range (not allocated through
+ alloc_empty_pages_and_pagevec), adding to the balloon. */
+void free_empty_pages(struct page **pagevec, int nr_pages);
+
void balloon_release_driver_page(struct page *page);
/*

72
xen-x86-exit-mmap Normal file
View file

@ -0,0 +1,72 @@
Subject: be more aggressive about de-activating mm-s under destruction
From: jbeulich@novell.com
Patch-mainline: obsolete
... by not only handling the current task on the CPU arch_exit_mmap()
gets executed on, but also forcing remote CPUs to do so.
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pgtable-xen.c 2010-03-22 12:59:39.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pgtable-xen.c 2010-03-22 12:59:47.000000000 +0100
@@ -1,5 +1,6 @@
#include <linux/mm.h>
#include <linux/module.h>
+#include <linux/smp.h>
#include <xen/features.h>
#include <asm/pgalloc.h>
#include <asm/pgtable.h>
@@ -437,27 +438,44 @@ void arch_dup_mmap(struct mm_struct *old
mm_pin(mm);
}
-void arch_exit_mmap(struct mm_struct *mm)
+/*
+ * We aggressively remove defunct pgd from cr3. We execute unmap_vmas() *much*
+ * faster this way, as no hypercalls are needed for the page table updates.
+ */
+static void leave_active_mm(struct task_struct *tsk, struct mm_struct *mm)
+ __releases(tsk->alloc_lock)
{
- struct task_struct *tsk = current;
-
- task_lock(tsk);
-
- /*
- * We aggressively remove defunct pgd from cr3. We execute unmap_vmas()
- * *much* faster this way, as no tlb flushes means bigger wrpt batches.
- */
if (tsk->active_mm == mm) {
tsk->active_mm = &init_mm;
atomic_inc(&init_mm.mm_count);
switch_mm(mm, &init_mm, tsk);
- atomic_dec(&mm->mm_count);
- BUG_ON(atomic_read(&mm->mm_count) == 0);
+ if (atomic_dec_and_test(&mm->mm_count))
+ BUG();
}
task_unlock(tsk);
+}
+
+static void _leave_active_mm(void *mm)
+{
+ struct task_struct *tsk = current;
+
+ if (spin_trylock(&tsk->alloc_lock))
+ leave_active_mm(tsk, mm);
+}
+
+void arch_exit_mmap(struct mm_struct *mm)
+{
+ struct task_struct *tsk = current;
+
+ task_lock(tsk);
+ leave_active_mm(tsk, mm);
+
+ preempt_disable();
+ smp_call_function_many(mm_cpumask(mm), _leave_active_mm, mm, 1);
+ preempt_enable();
if (PagePinned(virt_to_page(mm->pgd))
&& atomic_read(&mm->mm_count) == 1

204
xen-x86-machphys-prediction Normal file
View file

@ -0,0 +1,204 @@
From: jbeulich@novell.com
Subject: properly predict phys<->mach translations
Patch-mainline: obsolete
--- head-2009-07-28.orig/arch/x86/include/mach-xen/asm/maddr_32.h 2009-07-28 12:14:16.000000000 +0200
+++ head-2009-07-28/arch/x86/include/mach-xen/asm/maddr_32.h 2009-07-29 10:56:35.000000000 +0200
@@ -30,17 +30,19 @@ extern unsigned int machine_to_phys_or
static inline unsigned long pfn_to_mfn(unsigned long pfn)
{
- if (xen_feature(XENFEAT_auto_translated_physmap))
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
return pfn;
- BUG_ON(max_mapnr && pfn >= max_mapnr);
+ if (likely(max_mapnr))
+ BUG_ON(pfn >= max_mapnr);
return phys_to_machine_mapping[pfn] & ~FOREIGN_FRAME_BIT;
}
static inline int phys_to_machine_mapping_valid(unsigned long pfn)
{
- if (xen_feature(XENFEAT_auto_translated_physmap))
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
return 1;
- BUG_ON(max_mapnr && pfn >= max_mapnr);
+ if (likely(max_mapnr))
+ BUG_ON(pfn >= max_mapnr);
return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
}
@@ -48,7 +50,7 @@ static inline unsigned long mfn_to_pfn(u
{
unsigned long pfn;
- if (xen_feature(XENFEAT_auto_translated_physmap))
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
return mfn;
if (unlikely((mfn >> machine_to_phys_order) != 0))
@@ -95,17 +97,18 @@ static inline unsigned long mfn_to_pfn(u
static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
{
unsigned long pfn = mfn_to_pfn(mfn);
- if ((pfn < max_mapnr)
- && !xen_feature(XENFEAT_auto_translated_physmap)
- && (phys_to_machine_mapping[pfn] != mfn))
+ if (likely(pfn < max_mapnr)
+ && likely(!xen_feature(XENFEAT_auto_translated_physmap))
+ && unlikely(phys_to_machine_mapping[pfn] != mfn))
return max_mapnr; /* force !pfn_valid() */
return pfn;
}
static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
{
- BUG_ON(max_mapnr && pfn >= max_mapnr);
- if (xen_feature(XENFEAT_auto_translated_physmap)) {
+ if (likely(max_mapnr))
+ BUG_ON(pfn >= max_mapnr);
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return;
}
--- head-2009-07-28.orig/arch/x86/include/mach-xen/asm/maddr_64.h 2009-07-28 12:14:16.000000000 +0200
+++ head-2009-07-28/arch/x86/include/mach-xen/asm/maddr_64.h 2009-07-29 10:56:35.000000000 +0200
@@ -25,17 +25,19 @@ extern unsigned int machine_to_phys_or
static inline unsigned long pfn_to_mfn(unsigned long pfn)
{
- if (xen_feature(XENFEAT_auto_translated_physmap))
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
return pfn;
- BUG_ON(max_mapnr && pfn >= max_mapnr);
+ if (likely(max_mapnr))
+ BUG_ON(pfn >= max_mapnr);
return phys_to_machine_mapping[pfn] & ~FOREIGN_FRAME_BIT;
}
static inline int phys_to_machine_mapping_valid(unsigned long pfn)
{
- if (xen_feature(XENFEAT_auto_translated_physmap))
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
return 1;
- BUG_ON(max_mapnr && pfn >= max_mapnr);
+ if (likely(max_mapnr))
+ BUG_ON(pfn >= max_mapnr);
return (phys_to_machine_mapping[pfn] != INVALID_P2M_ENTRY);
}
@@ -43,7 +45,7 @@ static inline unsigned long mfn_to_pfn(u
{
unsigned long pfn;
- if (xen_feature(XENFEAT_auto_translated_physmap))
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap)))
return mfn;
if (unlikely((mfn >> machine_to_phys_order) != 0))
@@ -90,17 +92,18 @@ static inline unsigned long mfn_to_pfn(u
static inline unsigned long mfn_to_local_pfn(unsigned long mfn)
{
unsigned long pfn = mfn_to_pfn(mfn);
- if ((pfn < max_mapnr)
- && !xen_feature(XENFEAT_auto_translated_physmap)
- && (phys_to_machine_mapping[pfn] != mfn))
+ if (likely(pfn < max_mapnr)
+ && likely(!xen_feature(XENFEAT_auto_translated_physmap))
+ && unlikely(phys_to_machine_mapping[pfn] != mfn))
return max_mapnr; /* force !pfn_valid() */
return pfn;
}
static inline void set_phys_to_machine(unsigned long pfn, unsigned long mfn)
{
- BUG_ON(max_mapnr && pfn >= max_mapnr);
- if (xen_feature(XENFEAT_auto_translated_physmap)) {
+ if (likely(max_mapnr))
+ BUG_ON(pfn >= max_mapnr);
+ if (unlikely(xen_feature(XENFEAT_auto_translated_physmap))) {
BUG_ON(pfn != mfn && mfn != INVALID_P2M_ENTRY);
return;
}
--- head-2009-07-28.orig/arch/x86/include/mach-xen/asm/pgtable_types.h 2009-07-28 13:14:11.000000000 +0200
+++ head-2009-07-28/arch/x86/include/mach-xen/asm/pgtable_types.h 2009-07-29 10:56:35.000000000 +0200
@@ -207,7 +207,7 @@ typedef struct { pgdval_t pgd; } pgd_t;
#define __pgd_ma(x) ((pgd_t) { (x) } )
static inline pgd_t xen_make_pgd(pgdval_t val)
{
- if (val & _PAGE_PRESENT)
+ if (likely(val & _PAGE_PRESENT))
val = pte_phys_to_machine(val);
return (pgd_t) { val };
}
@@ -217,10 +217,10 @@ static inline pgdval_t xen_pgd_val(pgd_t
{
pgdval_t ret = __pgd_val(pgd);
#if PAGETABLE_LEVELS == 2 && CONFIG_XEN_COMPAT <= 0x030002
- if (ret)
+ if (likely(ret))
ret = machine_to_phys(ret) | _PAGE_PRESENT;
#else
- if (ret & _PAGE_PRESENT)
+ if (likely(ret & _PAGE_PRESENT))
ret = pte_machine_to_phys(ret);
#endif
return ret;
@@ -237,7 +237,7 @@ typedef struct { pudval_t pud; } pud_t;
#define __pud_ma(x) ((pud_t) { (x) } )
static inline pud_t xen_make_pud(pudval_t val)
{
- if (val & _PAGE_PRESENT)
+ if (likely(val & _PAGE_PRESENT))
val = pte_phys_to_machine(val);
return (pud_t) { val };
}
@@ -246,7 +246,7 @@ static inline pud_t xen_make_pud(pudval_
static inline pudval_t xen_pud_val(pud_t pud)
{
pudval_t ret = __pud_val(pud);
- if (ret & _PAGE_PRESENT)
+ if (likely(ret & _PAGE_PRESENT))
ret = pte_machine_to_phys(ret);
return ret;
}
@@ -266,7 +266,7 @@ typedef struct { pmdval_t pmd; } pmd_t;
#define __pmd_ma(x) ((pmd_t) { (x) } )
static inline pmd_t xen_make_pmd(pmdval_t val)
{
- if (val & _PAGE_PRESENT)
+ if (likely(val & _PAGE_PRESENT))
val = pte_phys_to_machine(val);
return (pmd_t) { val };
}
@@ -276,10 +276,10 @@ static inline pmdval_t xen_pmd_val(pmd_t
{
pmdval_t ret = __pmd_val(pmd);
#if CONFIG_XEN_COMPAT <= 0x030002
- if (ret)
+ if (likely(ret))
ret = pte_machine_to_phys(ret) | _PAGE_PRESENT;
#else
- if (ret & _PAGE_PRESENT)
+ if (likely(ret & _PAGE_PRESENT))
ret = pte_machine_to_phys(ret);
#endif
return ret;
@@ -308,7 +308,7 @@ static inline pmdval_t pmd_flags(pmd_t p
#define __pte_ma(x) ((pte_t) { .pte = (x) } )
static inline pte_t xen_make_pte(pteval_t val)
{
- if ((val & (_PAGE_PRESENT|_PAGE_IOMAP)) == _PAGE_PRESENT)
+ if (likely((val & (_PAGE_PRESENT|_PAGE_IOMAP)) == _PAGE_PRESENT))
val = pte_phys_to_machine(val);
return (pte_t) { .pte = val };
}
@@ -317,7 +317,7 @@ static inline pte_t xen_make_pte(pteval_
static inline pteval_t xen_pte_val(pte_t pte)
{
pteval_t ret = __pte_val(pte);
- if ((pte.pte_low & (_PAGE_PRESENT|_PAGE_IOMAP)) == _PAGE_PRESENT)
+ if (likely((pte.pte_low & (_PAGE_PRESENT|_PAGE_IOMAP)) == _PAGE_PRESENT))
ret = pte_machine_to_phys(ret);
return ret;
}

374
xen-x86-no-lapic Normal file
View file

@ -0,0 +1,374 @@
From: jbeulich@novell.com
Subject: Disallow all accesses to the local APIC page
Patch-mainline: obsolete
References: 191115
--- sle11sp1-2010-03-22.orig/arch/x86/include/asm/apic.h 2009-12-04 10:44:45.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/asm/apic.h 2009-10-13 17:19:31.000000000 +0200
@@ -10,7 +10,9 @@
#include <asm/processor.h>
#include <asm/apicdef.h>
#include <asm/atomic.h>
+#ifndef CONFIG_XEN
#include <asm/fixmap.h>
+#endif
#include <asm/mpspec.h>
#include <asm/system.h>
#include <asm/msr.h>
@@ -49,6 +51,7 @@ static inline void generic_apic_probe(vo
#ifdef CONFIG_X86_LOCAL_APIC
extern unsigned int apic_verbosity;
+#ifndef CONFIG_XEN
extern int local_apic_timer_c2_ok;
extern int disable_apic;
@@ -121,6 +124,8 @@ extern u64 native_apic_icr_read(void);
extern int x2apic_mode;
+#endif /* CONFIG_XEN */
+
#ifdef CONFIG_X86_X2APIC
/*
* Make previous memory operations globally visible before
@@ -367,6 +372,8 @@ struct apic {
*/
extern struct apic *apic;
+#ifndef CONFIG_XEN
+
/*
* APIC functionality to boot other CPUs - only used on SMP:
*/
@@ -460,6 +467,8 @@ static inline void default_wait_for_init
extern void generic_bigsmp_probe(void);
+#endif /* CONFIG_XEN */
+
#ifdef CONFIG_X86_LOCAL_APIC
@@ -479,6 +488,8 @@ static inline const struct cpumask *defa
DECLARE_EARLY_PER_CPU(u16, x86_bios_cpu_apicid);
+#ifndef CONFIG_XEN
+
static inline unsigned int read_apic_id(void)
{
unsigned int reg;
@@ -590,6 +601,8 @@ static inline physid_mask_t default_apic
return physid_mask_of_physid(phys_apicid);
}
+#endif /* CONFIG_XEN */
+
#endif /* CONFIG_X86_LOCAL_APIC */
#ifdef CONFIG_X86_32
--- sle11sp1-2010-03-22.orig/arch/x86/include/asm/apicdef.h 2010-03-22 12:07:53.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/asm/apicdef.h 2009-10-14 17:01:50.000000000 +0200
@@ -11,6 +11,8 @@
#define IO_APIC_DEFAULT_PHYS_BASE 0xfec00000
#define APIC_DEFAULT_PHYS_BASE 0xfee00000
+#ifndef CONFIG_XEN
+
#define APIC_ID 0x20
#define APIC_LVR 0x30
@@ -136,6 +138,16 @@
#define APIC_BASE_MSR 0x800
#define X2APIC_ENABLE (1UL << 10)
+#else /* CONFIG_XEN */
+
+enum {
+ APIC_DEST_ALLBUT = 0x1,
+ APIC_DEST_SELF,
+ APIC_DEST_ALLINC
+};
+
+#endif /* CONFIG_XEN */
+
#ifdef CONFIG_X86_32
# define MAX_IO_APICS 64
#else
@@ -143,6 +155,8 @@
# define MAX_LOCAL_APIC 32768
#endif
+#ifndef CONFIG_XEN
+
/*
* All x86-64 systems are xAPIC compatible.
* In the following, "apicid" is a physical APIC ID.
@@ -413,6 +427,8 @@ struct local_apic {
#undef u32
+#endif /* CONFIG_XEN */
+
#ifdef CONFIG_X86_32
#define BAD_APICID 0xFFu
#else
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/fixmap.h 2009-11-06 10:52:22.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/fixmap.h 2009-10-13 17:19:31.000000000 +0200
@@ -17,7 +17,6 @@
#ifndef __ASSEMBLY__
#include <linux/kernel.h>
#include <asm/acpi.h>
-#include <asm/apicdef.h>
#include <asm/page.h>
#ifdef CONFIG_X86_32
#include <linux/threads.h>
@@ -82,10 +81,10 @@ enum fixed_addresses {
#endif
FIX_DBGP_BASE,
FIX_EARLYCON_MEM_BASE,
+#ifndef CONFIG_XEN
#ifdef CONFIG_X86_LOCAL_APIC
FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */
#endif
-#ifndef CONFIG_XEN
#ifdef CONFIG_X86_IO_APIC
FIX_IO_APIC_BASE_0,
FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS - 1,
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/smp.h 2009-11-20 11:18:10.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/smp.h 2009-11-20 11:20:18.000000000 +0100
@@ -15,7 +15,7 @@
# include <asm/io_apic.h>
# endif
#endif
-#include <asm/thread_info.h>
+#include <linux/thread_info.h>
#include <asm/cpumask.h>
extern int smp_num_siblings;
@@ -168,7 +168,7 @@ extern unsigned disabled_cpus __cpuinitd
#include <asm/smp-processor-id.h>
-#ifdef CONFIG_X86_LOCAL_APIC
+#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_XEN)
#ifndef CONFIG_X86_64
static inline int logical_smp_processor_id(void)
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/acpi/boot.c 2010-02-17 14:50:55.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/acpi/boot.c 2010-02-18 15:30:28.000000000 +0100
@@ -72,13 +72,13 @@ int acpi_sci_override_gsi __initdata;
#ifndef CONFIG_XEN
int acpi_skip_timer_override __initdata;
int acpi_use_timer_override __initdata;
-#else
-#define acpi_skip_timer_override 0
-#endif
#ifdef CONFIG_X86_LOCAL_APIC
static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
#endif
+#else
+#define acpi_skip_timer_override 0
+#endif
#ifndef __HAVE_ARCH_CMPXCHG
#warning ACPI uses CMPXCHG, i486 and later hardware
@@ -137,6 +137,7 @@ static int __init acpi_parse_madt(struct
return -ENODEV;
}
+#ifndef CONFIG_XEN
if (madt->address) {
acpi_lapic_addr = (u64) madt->address;
@@ -144,7 +145,6 @@ static int __init acpi_parse_madt(struct
madt->address);
}
-#ifndef CONFIG_XEN
default_acpi_madt_oem_check(madt->header.oem_id,
madt->header.oem_table_id);
#endif
@@ -245,6 +245,7 @@ static int __init
acpi_parse_lapic_addr_ovr(struct acpi_subtable_header * header,
const unsigned long end)
{
+#ifndef CONFIG_XEN
struct acpi_madt_local_apic_override *lapic_addr_ovr = NULL;
lapic_addr_ovr = (struct acpi_madt_local_apic_override *)header;
@@ -253,6 +254,7 @@ acpi_parse_lapic_addr_ovr(struct acpi_su
return -EINVAL;
acpi_lapic_addr = lapic_addr_ovr->address;
+#endif
return 0;
}
@@ -1089,7 +1091,7 @@ int mp_register_gsi(struct device *dev,
ioapic_pin = mp_find_ioapic_pin(ioapic, gsi);
-#ifdef CONFIG_X86_32
+#if defined(CONFIG_X86_32) && !defined(CONFIG_XEN)
if (ioapic_renumber_irq)
gsi = ioapic_renumber_irq(ioapic, gsi);
#endif
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/apic/io_apic-xen.c 2010-03-22 12:52:03.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/apic/io_apic-xen.c 2010-03-22 12:59:10.000000000 +0100
@@ -1093,7 +1093,9 @@ static inline int irq_trigger(int idx)
return MPBIOS_trigger(idx);
}
+#ifndef CONFIG_XEN
int (*ioapic_renumber_irq)(int ioapic, int irq);
+#endif
static int pin_2_irq(int idx, int apic, int pin)
{
int irq, i;
@@ -1115,11 +1117,13 @@ static int pin_2_irq(int idx, int apic,
while (i < apic)
irq += nr_ioapic_registers[i++];
irq += pin;
+#ifndef CONFIG_XEN
/*
* For MPS mode, so far only needed by ES7000 platform
*/
if (ioapic_renumber_irq)
irq = ioapic_renumber_irq(apic, irq);
+#endif
}
#ifdef CONFIG_X86_32
@@ -4068,10 +4072,12 @@ int io_apic_set_pci_routing(struct devic
u8 __init io_apic_unique_id(u8 id)
{
#ifdef CONFIG_X86_32
+#ifndef CONFIG_XEN
if ((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) &&
!APIC_XAPIC(apic_version[boot_cpu_physical_apicid]))
return io_apic_get_unique_id(nr_ioapics, id);
else
+#endif
return id;
#else
int i;
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/irq-xen.c 2010-01-07 11:22:50.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/irq-xen.c 2009-12-18 10:14:24.000000000 +0100
@@ -15,9 +15,9 @@
#include <asm/mce.h>
#include <asm/hw_irq.h>
+#ifndef CONFIG_XEN
atomic_t irq_err_count;
-#ifndef CONFIG_XEN
/* Function pointer for generic interrupt vector handling */
void (*generic_interrupt_extension)(void) = NULL;
#endif
@@ -57,7 +57,7 @@ static int show_other_interrupts(struct
for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->__nmi_count);
seq_printf(p, " Non-maskable interrupts\n");
-#ifdef CONFIG_X86_LOCAL_APIC
+#if defined(CONFIG_X86_LOCAL_APIC) && !defined(CONFIG_XEN)
seq_printf(p, "%*s: ", prec, "LOC");
for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->apic_timer_irqs);
@@ -122,10 +122,12 @@ static int show_other_interrupts(struct
seq_printf(p, "%10u ", per_cpu(mce_poll_count, j));
seq_printf(p, " Machine check polls\n");
#endif
+#ifndef CONFIG_XEN
seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
#if defined(CONFIG_X86_IO_APIC)
seq_printf(p, "%*s: %10u\n", prec, "MIS", atomic_read(&irq_mis_count));
#endif
+#endif
return 0;
}
@@ -221,12 +223,16 @@ u64 arch_irq_stat_cpu(unsigned int cpu)
u64 arch_irq_stat(void)
{
+#ifndef CONFIG_XEN
u64 sum = atomic_read(&irq_err_count);
#ifdef CONFIG_X86_IO_APIC
sum += atomic_read(&irq_mis_count);
#endif
return sum;
+#else
+ return 0;
+#endif
}
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/mpparse-xen.c 2010-03-01 14:45:20.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/mpparse-xen.c 2010-03-01 14:47:29.000000000 +0100
@@ -288,7 +288,9 @@ static int __init smp_check_mpc(struct m
printk(KERN_INFO "MPTABLE: Product ID: %s\n", str);
+#ifndef CONFIG_XEN
printk(KERN_INFO "MPTABLE: APIC at: 0x%X\n", mpc->lapic);
+#endif
return 1;
}
@@ -320,12 +322,14 @@ static int __init smp_read_mpc(struct mp
if (!smp_check_mpc(mpc, oem, str))
return 0;
-#if defined(CONFIG_X86_32) && !defined(CONFIG_XEN)
+#ifndef CONFIG_XEN
+#ifdef CONFIG_X86_32
generic_mps_oem_check(mpc, oem, str);
#endif
/* save the local APIC address, it might be non-default */
if (!acpi_lapic)
mp_lapic_addr = mpc->lapic;
+#endif
if (early)
return 1;
@@ -512,10 +516,12 @@ static inline void __init construct_defa
int linttypes[2] = { mp_ExtINT, mp_NMI };
int i;
+#ifndef CONFIG_XEN
/*
* local APIC has default address
*/
mp_lapic_addr = APIC_DEFAULT_PHYS_BASE;
+#endif
/*
* 2 CPUs, numbered 0 & 1.
@@ -648,10 +654,12 @@ void __init default_get_smp_config(unsig
*/
if (mpf->feature1 != 0) {
if (early) {
+#ifndef CONFIG_XEN
/*
* local APIC has default address
*/
mp_lapic_addr = APIC_DEFAULT_PHYS_BASE;
+#endif
return;
}
--- sle11sp1-2010-03-22.orig/drivers/xen/core/smpboot.c 2010-03-22 12:57:50.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/smpboot.c 2010-03-22 12:59:04.000000000 +0100
@@ -362,7 +362,7 @@ void __init smp_prepare_cpus(unsigned in
* Here we can be sure that there is an IO-APIC in the system. Let's
* go and set it up:
*/
- if (!skip_ioapic_setup && nr_ioapics)
+ if (cpu_has_apic && !skip_ioapic_setup && nr_ioapics)
setup_IO_APIC();
#endif
}

32
xen-x86-panic-no-reboot Normal file
View file

@ -0,0 +1,32 @@
From: jbeulich@novell.com
Subject: Don't automatically reboot Dom0 on panic (match native)
Patch-mainline: obsolete
$subject says it all.
--- sle11sp1-2010-02-09.orig/arch/x86/kernel/setup-xen.c 2010-02-09 17:12:56.000000000 +0100
+++ sle11sp1-2010-02-09/arch/x86/kernel/setup-xen.c 2010-02-09 17:19:30.000000000 +0100
@@ -781,15 +781,17 @@ void __init setup_arch(char **cmdline_p)
unsigned long p2m_pages;
struct physdev_set_iopl set_iopl;
+ if (!is_initial_xendomain()) {
#ifdef CONFIG_X86_32
- /* Force a quick death if the kernel panics (not domain 0). */
- extern int panic_timeout;
- if (!panic_timeout && !is_initial_xendomain())
- panic_timeout = 1;
+ /* Force a quick death if the kernel panics (not domain 0). */
+ extern int panic_timeout;
+ if (!panic_timeout)
+ panic_timeout = 1;
#endif
- /* Register a call for panic conditions. */
- atomic_notifier_chain_register(&panic_notifier_list, &xen_panic_block);
+ /* Register a call for panic conditions. */
+ atomic_notifier_chain_register(&panic_notifier_list, &xen_panic_block);
+ }
#endif /* CONFIG_XEN */
#ifdef CONFIG_X86_32

638
xen-x86-per-cpu-vcpu-info Normal file
View file

@ -0,0 +1,638 @@
From: jbeulich@novell.com
Subject: x86: use per-cpu storage for shared vcpu_info structure
Patch-mainline: obsolete
... reducing access code size and latency, as well as being the
prerequisite for removing the limitation on 32 vCPU-s per guest.
--- sle11sp1-2010-03-29.orig/arch/x86/include/asm/percpu.h 2010-03-29 09:00:34.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/include/asm/percpu.h 2009-11-06 11:12:01.000000000 +0100
@@ -133,6 +133,38 @@ do { \
ret__; \
})
+#define percpu_xchg_op(op, var, val) \
+({ \
+ typedef typeof(var) T__; \
+ T__ ret__; \
+ if (0) \
+ ret__ = (val); \
+ switch (sizeof(var)) { \
+ case 1: \
+ asm(op "b %0,"__percpu_arg(1) \
+ : "=q" (ret__), "+m" (var) \
+ : "0" ((T__)(val))); \
+ break; \
+ case 2: \
+ asm(op "w %0,"__percpu_arg(1) \
+ : "=r" (ret__), "+m" (var) \
+ : "0" ((T__)(val))); \
+ break; \
+ case 4: \
+ asm(op "l %0,"__percpu_arg(1) \
+ : "=r" (ret__), "+m" (var) \
+ : "0" ((T__)(val))); \
+ break; \
+ case 8: \
+ asm(op "q %0,"__percpu_arg(1) \
+ : "=r" (ret__), "+m" (var) \
+ : "0" ((T__)(val))); \
+ break; \
+ default: __bad_percpu_size(); \
+ } \
+ ret__; \
+})
+
/*
* percpu_read() makes gcc load the percpu variable every time it is
* accessed while percpu_read_stable() allows the value to be cached.
@@ -152,6 +184,10 @@ do { \
#define percpu_and(var, val) percpu_to_op("and", per_cpu__##var, val)
#define percpu_or(var, val) percpu_to_op("or", per_cpu__##var, val)
#define percpu_xor(var, val) percpu_to_op("xor", per_cpu__##var, val)
+#define percpu_xchg(var, val) percpu_xchg_op("xchg", per_cpu__##var, val)
+#if defined(CONFIG_X86_XADD) || defined(CONFIG_X86_64)
+#define percpu_xadd(var, val) percpu_xchg_op("xadd", per_cpu__##var, val)
+#endif
/* This is not atomic against other CPUs -- CPU preemption needs to be off */
#define x86_test_and_clear_bit_percpu(bit, var) \
--- sle11sp1-2010-03-29.orig/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:49:39.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:53:45.000000000 +0100
@@ -50,12 +50,26 @@
extern shared_info_t *HYPERVISOR_shared_info;
+#ifdef CONFIG_XEN_VCPU_INFO_PLACEMENT
+DECLARE_PER_CPU(struct vcpu_info, vcpu_info);
+#define vcpu_info(cpu) (&per_cpu(vcpu_info, cpu))
+#define current_vcpu_info() (&__get_cpu_var(vcpu_info))
+#define vcpu_info_read(fld) percpu_read(vcpu_info.fld)
+#define vcpu_info_write(fld, val) percpu_write(vcpu_info.fld, val)
+#define vcpu_info_xchg(fld, val) percpu_xchg(vcpu_info.fld, val)
+void setup_vcpu_info(unsigned int cpu);
+void adjust_boot_vcpu_info(void);
+#else
#define vcpu_info(cpu) (HYPERVISOR_shared_info->vcpu_info + (cpu))
#ifdef CONFIG_SMP
#define current_vcpu_info() vcpu_info(smp_processor_id())
#else
#define current_vcpu_info() vcpu_info(0)
#endif
+#define vcpu_info_read(fld) (current_vcpu_info()->fld)
+#define vcpu_info_write(fld, val) (current_vcpu_info()->fld = (val))
+static inline void setup_vcpu_info(unsigned int cpu) {}
+#endif
#ifdef CONFIG_X86_32
extern unsigned long hypervisor_virt_start;
--- sle11sp1-2010-03-29.orig/arch/x86/include/mach-xen/asm/irqflags.h 2009-11-06 10:52:22.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/include/mach-xen/asm/irqflags.h 2009-11-06 11:12:01.000000000 +0100
@@ -12,7 +12,7 @@
* includes these barriers, for example.
*/
-#define xen_save_fl(void) (current_vcpu_info()->evtchn_upcall_mask)
+#define xen_save_fl(void) vcpu_info_read(evtchn_upcall_mask)
#define xen_restore_fl(f) \
do { \
@@ -28,7 +28,7 @@ do { \
#define xen_irq_disable() \
do { \
- current_vcpu_info()->evtchn_upcall_mask = 1; \
+ vcpu_info_write(evtchn_upcall_mask, 1); \
barrier(); \
} while (0)
@@ -90,8 +90,6 @@ static inline void halt(void)
#define evtchn_upcall_pending /* 0 */
#define evtchn_upcall_mask 1
-#define sizeof_vcpu_shift 6
-
#ifdef CONFIG_X86_64
# define __REG_si %rsi
# define __CPU_num PER_CPU_VAR(cpu_number)
@@ -100,6 +98,22 @@ static inline void halt(void)
# define __CPU_num TI_cpu(%ebp)
#endif
+#ifdef CONFIG_XEN_VCPU_INFO_PLACEMENT
+
+#define GET_VCPU_INFO PER_CPU(vcpu_info, __REG_si)
+#define __DISABLE_INTERRUPTS movb $1,PER_CPU_VAR(vcpu_info+evtchn_upcall_mask)
+#define __ENABLE_INTERRUPTS movb $0,PER_CPU_VAR(vcpu_info+evtchn_upcall_mask)
+#define __TEST_PENDING cmpb $0,PER_CPU_VAR(vcpu_info+evtchn_upcall_pending+0)
+#define DISABLE_INTERRUPTS(clb) __DISABLE_INTERRUPTS
+#define ENABLE_INTERRUPTS(clb) __ENABLE_INTERRUPTS
+
+#define __SIZEOF_DISABLE_INTERRUPTS 8
+#define __SIZEOF_TEST_PENDING 8
+
+#else /* CONFIG_XEN_VCPU_INFO_PLACEMENT */
+
+#define sizeof_vcpu_shift 6
+
#ifdef CONFIG_SMP
#define GET_VCPU_INFO movl __CPU_num,%esi ; \
shl $sizeof_vcpu_shift,%esi ; \
@@ -116,15 +130,21 @@ static inline void halt(void)
#define ENABLE_INTERRUPTS(clb) GET_VCPU_INFO ; \
__ENABLE_INTERRUPTS
+#define __SIZEOF_DISABLE_INTERRUPTS 4
+#define __SIZEOF_TEST_PENDING 3
+
+#endif /* CONFIG_XEN_VCPU_INFO_PLACEMENT */
+
#ifndef CONFIG_X86_64
#define INTERRUPT_RETURN iret
-#define ENABLE_INTERRUPTS_SYSEXIT __ENABLE_INTERRUPTS ; \
+#define ENABLE_INTERRUPTS_SYSEXIT \
+ movb $0,evtchn_upcall_mask(%esi) /* __ENABLE_INTERRUPTS */ ; \
sysexit_scrit: /**** START OF SYSEXIT CRITICAL REGION ****/ ; \
- __TEST_PENDING ; \
+ cmpb $0,evtchn_upcall_pending(%esi) /* __TEST_PENDING */ ; \
jnz 14f /* process more events if necessary... */ ; \
movl PT_ESI(%esp), %esi ; \
sysexit ; \
-14: __DISABLE_INTERRUPTS ; \
+14: movb $1,evtchn_upcall_mask(%esi) /* __DISABLE_INTERRUPTS */ ; \
TRACE_IRQS_OFF ; \
sysexit_ecrit: /**** END OF SYSEXIT CRITICAL REGION ****/ ; \
mov $__KERNEL_PERCPU, %ecx ; \
--- sle11sp1-2010-03-29.orig/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-10-13 17:22:09.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-11-06 11:12:01.000000000 +0100
@@ -117,6 +117,8 @@ static inline void xen_set_pgd(pgd_t *pg
#define __pte_mfn(_pte) (((_pte).pte & PTE_PFN_MASK) >> PAGE_SHIFT)
+extern unsigned long early_arbitrary_virt_to_mfn(void *va);
+
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
--- sle11sp1-2010-03-29.orig/arch/x86/include/mach-xen/asm/system.h 2009-11-06 10:52:22.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/include/mach-xen/asm/system.h 2009-11-06 11:12:01.000000000 +0100
@@ -233,8 +233,8 @@ static inline void xen_write_cr0(unsigne
asm volatile("mov %0,%%cr0": : "r" (val), "m" (__force_order));
}
-#define xen_read_cr2() (current_vcpu_info()->arch.cr2)
-#define xen_write_cr2(val) ((void)(current_vcpu_info()->arch.cr2 = (val)))
+#define xen_read_cr2() vcpu_info_read(arch.cr2)
+#define xen_write_cr2(val) vcpu_info_write(arch.cr2, val)
static inline unsigned long xen_read_cr3(void)
{
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/cpu/common-xen.c 2010-01-18 17:05:30.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/kernel/cpu/common-xen.c 2009-11-06 11:12:01.000000000 +0100
@@ -335,8 +335,16 @@ static const char *__cpuinit table_looku
__u32 cpu_caps_cleared[NCAPINTS] __cpuinitdata;
__u32 cpu_caps_set[NCAPINTS] __cpuinitdata;
-void load_percpu_segment(int cpu)
+void __ref load_percpu_segment(int cpu)
{
+#ifdef CONFIG_XEN_VCPU_INFO_PLACEMENT
+ static bool done;
+
+ if (!done) {
+ done = true;
+ adjust_boot_vcpu_info();
+ }
+#endif
#ifdef CONFIG_X86_32
loadsegment(fs, __KERNEL_PERCPU);
#else
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/entry_32-xen.S 2009-10-13 17:01:47.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/entry_32-xen.S 2009-11-06 11:12:01.000000000 +0100
@@ -463,6 +463,9 @@ sysenter_exit:
movl PT_EIP(%esp), %edx
movl PT_OLDESP(%esp), %ecx
xorl %ebp,%ebp
+#ifdef CONFIG_XEN_VCPU_INFO_PLACEMENT
+ GET_VCPU_INFO
+#endif
TRACE_IRQS_ON
1: mov PT_FS(%esp), %fs
PTGS_TO_GS
@@ -975,7 +978,9 @@ critical_region_fixup:
.section .rodata,"a"
critical_fixup_table:
- .byte -1,-1,-1 # testb $0xff,(%esi) = __TEST_PENDING
+ .rept __SIZEOF_TEST_PENDING
+ .byte -1
+ .endr
.byte -1,-1 # jnz 14f
.byte 0 # pop %ebx
.byte 1 # pop %ecx
@@ -994,7 +999,9 @@ critical_fixup_table:
.byte 10,10,10 # add $8,%esp
#endif
.byte 12 # iret
- .byte -1,-1,-1,-1 # movb $1,1(%esi) = __DISABLE_INTERRUPTS
+ .rept __SIZEOF_DISABLE_INTERRUPTS
+ .byte -1
+ .endr
.previous
# Hypervisor uses this for application faults while it executes.
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/head-xen.c 2009-10-15 15:32:46.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/head-xen.c 2009-11-06 11:12:01.000000000 +0100
@@ -151,6 +151,8 @@ void __init xen_start_kernel(void)
HYPERVISOR_shared_info = (shared_info_t *)fix_to_virt(FIX_SHARED_INFO);
memset(empty_zero_page, 0, sizeof(empty_zero_page));
+ setup_vcpu_info(0);
+
/* Set up mapping of lowest 1MB of physical memory. */
for (i = 0; i < NR_FIX_ISAMAPS; i++)
if (is_initial_xendomain())
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/time-xen.c 2010-02-04 09:43:52.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/kernel/time-xen.c 2010-02-09 17:20:06.000000000 +0100
@@ -276,16 +276,10 @@ static void get_time_values_from_xen(uns
local_irq_restore(flags);
}
-static inline int time_values_up_to_date(unsigned int cpu)
+static inline int time_values_up_to_date(void)
{
- struct vcpu_time_info *src;
- struct shadow_time_info *dst;
-
- src = &vcpu_info(cpu)->time;
- dst = &per_cpu(shadow_time, cpu);
-
rmb();
- return (dst->version == src->version);
+ return percpu_read(shadow_time.version) == vcpu_info_read(time.version);
}
static void sync_xen_wallclock(unsigned long dummy);
@@ -331,7 +325,7 @@ static unsigned long long local_clock(vo
local_time_version = shadow->version;
rdtsc_barrier();
time = shadow->system_timestamp + get_nsec_offset(shadow);
- if (!time_values_up_to_date(cpu))
+ if (!time_values_up_to_date())
get_time_values_from_xen(cpu);
barrier();
} while (local_time_version != shadow->version);
@@ -455,7 +449,7 @@ static irqreturn_t timer_interrupt(int i
delta_cpu -= per_cpu(processed_system_time, cpu);
get_runstate_snapshot(&runstate);
- } while (!time_values_up_to_date(cpu));
+ } while (!time_values_up_to_date());
if ((unlikely(delta < -(s64)permitted_clock_jitter) ||
unlikely(delta_cpu < -(s64)permitted_clock_jitter))
--- sle11sp1-2010-03-29.orig/arch/x86/mm/hypervisor.c 2009-12-11 15:27:37.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/mm/hypervisor.c 2010-01-05 16:47:18.000000000 +0100
@@ -41,6 +41,7 @@
#include <xen/balloon.h>
#include <xen/features.h>
#include <xen/interface/memory.h>
+#include <xen/interface/vcpu.h>
#include <linux/module.h>
#include <linux/percpu.h>
#include <linux/highmem.h>
@@ -50,7 +51,105 @@
EXPORT_SYMBOL(hypercall_page);
shared_info_t *__read_mostly HYPERVISOR_shared_info = (shared_info_t *)empty_zero_page;
+#ifndef CONFIG_XEN_VCPU_INFO_PLACEMENT
EXPORT_SYMBOL(HYPERVISOR_shared_info);
+#else
+DEFINE_PER_CPU(struct vcpu_info, vcpu_info) __aligned(sizeof(struct vcpu_info));
+EXPORT_PER_CPU_SYMBOL(vcpu_info);
+
+void __ref setup_vcpu_info(unsigned int cpu)
+{
+ struct vcpu_info *v = &per_cpu(vcpu_info, cpu);
+ struct vcpu_register_vcpu_info info;
+#ifdef CONFIG_X86_64
+ static bool first = true;
+
+ if (first) {
+ first = false;
+ info.mfn = early_arbitrary_virt_to_mfn(v);
+ } else
+#endif
+ info.mfn = arbitrary_virt_to_mfn(v);
+ info.offset = offset_in_page(v);
+
+ if (HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info,
+ cpu, &info))
+ BUG();
+}
+
+void __init adjust_boot_vcpu_info(void)
+{
+ unsigned long lpfn, rpfn, lmfn, rmfn;
+ pte_t *lpte, *rpte;
+ unsigned int level;
+ mmu_update_t mmu[2];
+
+ /*
+ * setup_vcpu_info() cannot be used more than once for a given (v)CPU,
+ * hence we must swap the underlying MFNs of the two pages holding old
+ * and new vcpu_info of the boot CPU.
+ *
+ * Do *not* use __get_cpu_var() or percpu_{write,...}() here, as the per-
+ * CPU segment didn't get reloaded yet. Using percpu_read(), as in
+ * arch_use_lazy_mmu_mode(), though undesirable, is safe except for the
+ * accesses to variables that were updated in setup_percpu_areas().
+ */
+ lpte = lookup_address((unsigned long)&per_cpu_var(vcpu_info)
+ + (__per_cpu_load - __per_cpu_start),
+ &level);
+ rpte = lookup_address((unsigned long)&per_cpu(vcpu_info, 0), &level);
+ BUG_ON(!lpte || !(pte_flags(*lpte) & _PAGE_PRESENT));
+ BUG_ON(!rpte || !(pte_flags(*rpte) & _PAGE_PRESENT));
+ lmfn = __pte_mfn(*lpte);
+ rmfn = __pte_mfn(*rpte);
+
+ if (lmfn == rmfn)
+ return;
+
+ lpfn = mfn_to_local_pfn(lmfn);
+ rpfn = mfn_to_local_pfn(rmfn);
+
+ printk(KERN_INFO
+ "Swapping MFNs for PFN %lx and %lx (MFN %lx and %lx)\n",
+ lpfn, rpfn, lmfn, rmfn);
+
+ xen_l1_entry_update(lpte, pfn_pte_ma(rmfn, pte_pgprot(*lpte)));
+ xen_l1_entry_update(rpte, pfn_pte_ma(lmfn, pte_pgprot(*rpte)));
+#ifdef CONFIG_X86_64
+ if (HYPERVISOR_update_va_mapping((unsigned long)__va(lpfn<<PAGE_SHIFT),
+ pfn_pte_ma(rmfn, PAGE_KERNEL_RO), 0))
+ BUG();
+#endif
+ if (HYPERVISOR_update_va_mapping((unsigned long)__va(rpfn<<PAGE_SHIFT),
+ pfn_pte_ma(lmfn, PAGE_KERNEL),
+ UVMF_TLB_FLUSH))
+ BUG();
+
+ set_phys_to_machine(lpfn, rmfn);
+ set_phys_to_machine(rpfn, lmfn);
+
+ mmu[0].ptr = ((uint64_t)lmfn << PAGE_SHIFT) | MMU_MACHPHYS_UPDATE;
+ mmu[0].val = rpfn;
+ mmu[1].ptr = ((uint64_t)rmfn << PAGE_SHIFT) | MMU_MACHPHYS_UPDATE;
+ mmu[1].val = lpfn;
+ if (HYPERVISOR_mmu_update(mmu, 2, NULL, DOMID_SELF))
+ BUG();
+
+ /*
+ * Copy over all contents of the page just replaced, except for the
+ * vcpu_info itself, as it may have got updated after having been
+ * copied from __per_cpu_load[].
+ */
+ memcpy(__va(rpfn << PAGE_SHIFT),
+ __va(lpfn << PAGE_SHIFT),
+ (unsigned long)&per_cpu_var(vcpu_info) & (PAGE_SIZE - 1));
+ level = (unsigned long)(&per_cpu_var(vcpu_info) + 1) & (PAGE_SIZE - 1);
+ if (level)
+ memcpy(__va(rpfn << PAGE_SHIFT) + level,
+ __va(lpfn << PAGE_SHIFT) + level,
+ PAGE_SIZE - level);
+}
+#endif
#define NR_MC BITS_PER_LONG
#define NR_MMU BITS_PER_LONG
--- sle11sp1-2010-03-29.orig/arch/x86/mm/init_64-xen.c 2009-11-12 17:37:05.000000000 +0100
+++ sle11sp1-2010-03-29/arch/x86/mm/init_64-xen.c 2009-11-06 11:12:01.000000000 +0100
@@ -116,6 +116,26 @@ void __meminit early_make_page_readonly(
BUG();
}
+unsigned long __init early_arbitrary_virt_to_mfn(void *v)
+{
+ unsigned long va = (unsigned long)v, addr, *page;
+
+ BUG_ON(va < __START_KERNEL_map);
+
+ page = (void *)(xen_read_cr3() + __START_KERNEL_map);
+
+ addr = page[pgd_index(va)];
+ addr_to_page(addr, page);
+
+ addr = page[pud_index(va)];
+ addr_to_page(addr, page);
+
+ addr = page[pmd_index(va)];
+ addr_to_page(addr, page);
+
+ return (page[pte_index(va)] & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT;
+}
+
#ifndef CONFIG_XEN
static int __init parse_direct_gbpages_off(char *arg)
{
--- sle11sp1-2010-03-29.orig/drivers/xen/Kconfig 2010-03-29 09:13:58.000000000 +0200
+++ sle11sp1-2010-03-29/drivers/xen/Kconfig 2010-03-29 09:14:20.000000000 +0200
@@ -366,6 +366,18 @@ config XEN_COMPAT
default 0x030002 if XEN_COMPAT_030002_AND_LATER
default 0
+config XEN_VCPU_INFO_PLACEMENT
+ bool "Place shared vCPU info in per-CPU storage"
+# depends on X86 && (XEN_COMPAT >= 0x00030101)
+ depends on X86
+ depends on !XEN_COMPAT_030002_AND_LATER
+ depends on !XEN_COMPAT_030004_AND_LATER
+ depends on !XEN_COMPAT_030100_AND_LATER
+ default SMP
+ ---help---
+ This allows faster access to the per-vCPU shared info
+ structure.
+
endmenu
config HAVE_IRQ_IGNORE_UNHANDLED
--- sle11sp1-2010-03-29.orig/drivers/xen/core/evtchn.c 2010-02-09 17:19:07.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/evtchn.c 2010-02-09 17:20:42.000000000 +0100
@@ -316,6 +316,24 @@ static DEFINE_PER_CPU(unsigned int, upca
static DEFINE_PER_CPU(unsigned int, current_l1i);
static DEFINE_PER_CPU(unsigned int, current_l2i);
+#ifndef vcpu_info_xchg
+#define vcpu_info_xchg(fld, val) xchg(&current_vcpu_info()->fld, val)
+#endif
+
+#ifndef percpu_xadd
+#define percpu_xadd(var, val) \
+({ \
+ typeof(per_cpu_var(var)) __tmp_var__; \
+ unsigned long flags; \
+ local_irq_save(flags); \
+ __tmp_var__ = get_cpu_var(var); \
+ __get_cpu_var(var) += (val); \
+ put_cpu_var(var); \
+ local_irq_restore(flags); \
+ __tmp_var__; \
+})
+#endif
+
/* NB. Interrupts are disabled on entry. */
asmlinkage void __irq_entry evtchn_do_upcall(struct pt_regs *regs)
{
@@ -324,25 +342,25 @@ asmlinkage void __irq_entry evtchn_do_up
unsigned long masked_l1, masked_l2;
unsigned int l1i, l2i, start_l1i, start_l2i, port, count, i;
int irq;
- vcpu_info_t *vcpu_info = current_vcpu_info();
exit_idle();
irq_enter();
do {
/* Avoid a callback storm when we reenable delivery. */
- vcpu_info->evtchn_upcall_pending = 0;
+ vcpu_info_write(evtchn_upcall_pending, 0);
/* Nested invocations bail immediately. */
- percpu_add(upcall_count, 1);
- if (unlikely(percpu_read(upcall_count) != 1))
+ if (unlikely(percpu_xadd(upcall_count, 1)))
break;
#ifndef CONFIG_X86 /* No need for a barrier -- XCHG is a barrier on x86. */
/* Clear master flag /before/ clearing selector flag. */
wmb();
+#else
+ barrier();
#endif
- l1 = xchg(&vcpu_info->evtchn_pending_sel, 0);
+ l1 = vcpu_info_xchg(evtchn_pending_sel, 0);
start_l1i = l1i = percpu_read(current_l1i);
start_l2i = percpu_read(current_l2i);
@@ -1369,7 +1387,6 @@ void unmask_evtchn(int port)
{
shared_info_t *s = HYPERVISOR_shared_info;
unsigned int cpu = smp_processor_id();
- vcpu_info_t *vcpu_info = &s->vcpu_info[cpu];
BUG_ON(!irqs_disabled());
@@ -1383,10 +1400,13 @@ void unmask_evtchn(int port)
synch_clear_bit(port, s->evtchn_mask);
/* Did we miss an interrupt 'edge'? Re-fire if so. */
- if (synch_test_bit(port, s->evtchn_pending) &&
- !synch_test_and_set_bit(port / BITS_PER_LONG,
- &vcpu_info->evtchn_pending_sel))
- vcpu_info->evtchn_upcall_pending = 1;
+ if (synch_test_bit(port, s->evtchn_pending)) {
+ vcpu_info_t *vcpu_info = current_vcpu_info();
+
+ if (!synch_test_and_set_bit(port / BITS_PER_LONG,
+ &vcpu_info->evtchn_pending_sel))
+ vcpu_info->evtchn_upcall_pending = 1;
+ }
}
EXPORT_SYMBOL_GPL(unmask_evtchn);
--- sle11sp1-2010-03-29.orig/drivers/xen/core/machine_reboot.c 2009-12-18 14:15:04.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/machine_reboot.c 2009-12-18 14:15:17.000000000 +0100
@@ -73,7 +73,7 @@ static void pre_suspend(void)
mfn_to_pfn(xen_start_info->console.domU.mfn);
}
-static void post_suspend(int suspend_cancelled)
+static void post_suspend(int suspend_cancelled, int fast_suspend)
{
int i, j, k, fpp;
unsigned long shinfo_mfn;
@@ -90,8 +90,21 @@ static void post_suspend(int suspend_can
#ifdef CONFIG_SMP
cpumask_copy(vcpu_initialized_mask, cpu_online_mask);
#endif
- for_each_possible_cpu(i)
+ for_each_possible_cpu(i) {
setup_runstate_area(i);
+
+#ifdef CONFIG_XEN_VCPU_INFO_PLACEMENT
+ if (fast_suspend && i != smp_processor_id()
+ && HYPERVISOR_vcpu_op(VCPUOP_down, i, NULL))
+ BUG();
+
+ setup_vcpu_info(i);
+
+ if (fast_suspend && i != smp_processor_id()
+ && HYPERVISOR_vcpu_op(VCPUOP_up, i, NULL))
+ BUG();
+#endif
+ }
}
shinfo_mfn = xen_start_info->shared_info >> PAGE_SHIFT;
@@ -133,7 +146,7 @@ static void post_suspend(int suspend_can
#define switch_idle_mm() ((void)0)
#define mm_pin_all() ((void)0)
#define pre_suspend() xen_pre_suspend()
-#define post_suspend(x) xen_post_suspend(x)
+#define post_suspend(x, f) xen_post_suspend(x)
#endif
@@ -164,7 +177,7 @@ static int take_machine_down(void *_susp
BUG_ON(suspend_cancelled > 0);
suspend->resume_notifier(suspend_cancelled);
if (suspend_cancelled >= 0) {
- post_suspend(suspend_cancelled);
+ post_suspend(suspend_cancelled, suspend->fast_suspend);
sysdev_resume();
}
if (!suspend_cancelled) {
--- sle11sp1-2010-03-29.orig/drivers/xen/core/smpboot.c 2010-03-22 12:59:04.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/smpboot.c 2010-03-22 12:59:52.000000000 +0100
@@ -369,8 +369,13 @@ void __init smp_prepare_cpus(unsigned in
void __init smp_prepare_boot_cpu(void)
{
+ unsigned int cpu;
+
switch_to_new_gdt(smp_processor_id());
prefill_possible_map();
+ for_each_possible_cpu(cpu)
+ if (cpu != smp_processor_id())
+ setup_vcpu_info(cpu);
}
#ifdef CONFIG_HOTPLUG_CPU
--- sle11sp1-2010-03-29.orig/drivers/xen/core/spinlock.c 2010-03-22 12:58:39.000000000 +0100
+++ sle11sp1-2010-03-29/drivers/xen/core/spinlock.c 2010-03-22 12:59:54.000000000 +0100
@@ -104,7 +104,7 @@ bool xen_spin_wait(raw_spinlock_t *lock,
spinning.prev = percpu_read(spinning);
smp_wmb();
percpu_write(spinning, &spinning);
- upcall_mask = current_vcpu_info()->evtchn_upcall_mask;
+ upcall_mask = vcpu_info_read(evtchn_upcall_mask);
do {
bool nested = false;
@@ -170,12 +170,12 @@ bool xen_spin_wait(raw_spinlock_t *lock,
* intended event processing will happen with the poll
* call.
*/
- current_vcpu_info()->evtchn_upcall_mask =
- nested ? upcall_mask : flags;
+ vcpu_info_write(evtchn_upcall_mask,
+ nested ? upcall_mask : flags);
xen_poll_irq(irq);
- current_vcpu_info()->evtchn_upcall_mask = upcall_mask;
+ vcpu_info_write(evtchn_upcall_mask, upcall_mask);
rc = !xen_test_irq_pending(irq);
if (!rc)

605
xen-x86-pmd-handling Normal file
View file

@ -0,0 +1,605 @@
From: jbeulich@novell.com
Subject: consolidate pmd/pud/pgd entry handling
Patch-mainline: obsolete
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:45:08.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:49:39.000000000 +0100
@@ -99,10 +99,12 @@ void xen_invlpg(unsigned long ptr);
void xen_l1_entry_update(pte_t *ptr, pte_t val);
void xen_l2_entry_update(pmd_t *ptr, pmd_t val);
void xen_l3_entry_update(pud_t *ptr, pud_t val); /* x86_64/PAE */
-void xen_l4_entry_update(pgd_t *ptr, pgd_t val); /* x86_64 only */
+void xen_l4_entry_update(pgd_t *ptr, int user, pgd_t val); /* x86_64 only */
void xen_pgd_pin(unsigned long ptr);
void xen_pgd_unpin(unsigned long ptr);
+void xen_init_pgd_pin(void);
+
void xen_set_ldt(const void *ptr, unsigned int ents);
#ifdef CONFIG_SMP
@@ -335,6 +337,18 @@ MULTI_update_va_mapping(
}
static inline void
+MULTI_mmu_update(multicall_entry_t *mcl, mmu_update_t *req,
+ unsigned int count, unsigned int *success_count,
+ domid_t domid)
+{
+ mcl->op = __HYPERVISOR_mmu_update;
+ mcl->args[0] = (unsigned long)req;
+ mcl->args[1] = count;
+ mcl->args[2] = (unsigned long)success_count;
+ mcl->args[3] = domid;
+}
+
+static inline void
MULTI_grant_table_op(multicall_entry_t *mcl, unsigned int cmd,
void *uop, unsigned int count)
{
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/pgalloc.h 2010-03-22 12:47:01.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/pgalloc.h 2010-03-22 12:59:30.000000000 +0100
@@ -75,20 +75,16 @@ static inline void pmd_populate(struct m
struct page *pte)
{
unsigned long pfn = page_to_pfn(pte);
+ pmd_t ent = __pmd(((pmdval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE);
paravirt_alloc_pte(mm, pfn);
- if (PagePinned(virt_to_page(mm->pgd))) {
- if (!PageHighMem(pte))
- BUG_ON(HYPERVISOR_update_va_mapping(
- (unsigned long)__va(pfn << PAGE_SHIFT),
- pfn_pte(pfn, PAGE_KERNEL_RO), 0));
-#ifndef CONFIG_X86_64
- else if (!TestSetPagePinned(pte))
- kmap_flush_unused();
+ if (PagePinned(virt_to_page(pmd))) {
+#ifndef CONFIG_HIGHPTE
+ BUG_ON(PageHighMem(pte));
#endif
- set_pmd(pmd, __pmd(((pmdval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE));
+ set_pmd(pmd, ent);
} else
- *pmd = __pmd(((pmdval_t)pfn << PAGE_SHIFT) | _PAGE_TABLE);
+ *pmd = ent;
}
#define pmd_pgtable(pmd) pmd_page(pmd)
@@ -116,39 +112,28 @@ extern void pud_populate(struct mm_struc
#else /* !CONFIG_X86_PAE */
static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
{
+ pud_t ent = __pud(_PAGE_TABLE | __pa(pmd));
+
paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
- if (unlikely(PagePinned(virt_to_page((mm)->pgd)))) {
- BUG_ON(HYPERVISOR_update_va_mapping(
- (unsigned long)pmd,
- pfn_pte(virt_to_phys(pmd)>>PAGE_SHIFT,
- PAGE_KERNEL_RO), 0));
- set_pud(pud, __pud(_PAGE_TABLE | __pa(pmd)));
- } else
- *pud = __pud(_PAGE_TABLE | __pa(pmd));
+ if (PagePinned(virt_to_page(pud)))
+ set_pud(pud, ent);
+ else
+ *pud = ent;
}
#endif /* CONFIG_X86_PAE */
#if PAGETABLE_LEVELS > 3
#define __user_pgd(pgd) ((pgd) + PTRS_PER_PGD)
-/*
- * We need to use the batch mode here, but pgd_pupulate() won't be
- * be called frequently.
- */
static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
{
+ pgd_t ent = __pgd(_PAGE_TABLE | __pa(pud));
+
paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
- if (unlikely(PagePinned(virt_to_page((mm)->pgd)))) {
- BUG_ON(HYPERVISOR_update_va_mapping(
- (unsigned long)pud,
- pfn_pte(virt_to_phys(pud)>>PAGE_SHIFT,
- PAGE_KERNEL_RO), 0));
- set_pgd(pgd, __pgd(_PAGE_TABLE | __pa(pud)));
- set_pgd(__user_pgd(pgd), __pgd(_PAGE_TABLE | __pa(pud)));
- } else {
- *(pgd) = __pgd(_PAGE_TABLE | __pa(pud));
- *__user_pgd(pgd) = *(pgd);
- }
+ if (unlikely(PagePinned(virt_to_page(pgd))))
+ xen_l4_entry_update(pgd, 1, ent);
+ else
+ *__user_pgd(pgd) = *pgd = ent;
}
static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/pgtable-3level.h 2009-11-06 10:52:02.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/pgtable-3level.h 2009-10-13 17:22:09.000000000 +0200
@@ -61,12 +61,15 @@ static inline void __xen_pte_clear(pte_t
ptep->pte_high = 0;
}
-static inline void xen_pmd_clear(pmd_t *pmd)
-{
- xen_l2_entry_update(pmd, __pmd(0));
-}
+#define xen_pmd_clear(pmd) \
+({ \
+ pmd_t *__pmdp = (pmd); \
+ PagePinned(virt_to_page(__pmdp)) \
+ ? set_pmd(__pmdp, __pmd(0)) \
+ : (void)(*__pmdp = __pmd(0)); \
+})
-static inline void pud_clear(pud_t *pudp)
+static inline void __xen_pud_clear(pud_t *pudp)
{
pgdval_t pgd;
@@ -87,6 +90,14 @@ static inline void pud_clear(pud_t *pudp
xen_tlb_flush();
}
+#define xen_pud_clear(pudp) \
+({ \
+ pud_t *__pudp = (pudp); \
+ PagePinned(virt_to_page(__pudp)) \
+ ? __xen_pud_clear(__pudp) \
+ : (void)(*__pudp = __pud(0)); \
+})
+
#ifdef CONFIG_SMP
static inline pte_t xen_ptep_get_and_clear(pte_t *ptep, pte_t res)
{
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-11-06 10:52:09.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-10-13 17:22:09.000000000 +0200
@@ -79,33 +79,41 @@ static inline void xen_set_pmd(pmd_t *pm
xen_l2_entry_update(pmdp, pmd);
}
-static inline void xen_pmd_clear(pmd_t *pmd)
-{
- xen_set_pmd(pmd, xen_make_pmd(0));
-}
+#define xen_pmd_clear(pmd) \
+({ \
+ pmd_t *__pmdp = (pmd); \
+ PagePinned(virt_to_page(__pmdp)) \
+ ? set_pmd(__pmdp, xen_make_pmd(0)) \
+ : (void)(*__pmdp = xen_make_pmd(0)); \
+})
static inline void xen_set_pud(pud_t *pudp, pud_t pud)
{
xen_l3_entry_update(pudp, pud);
}
-static inline void xen_pud_clear(pud_t *pud)
-{
- xen_set_pud(pud, xen_make_pud(0));
-}
+#define xen_pud_clear(pud) \
+({ \
+ pud_t *__pudp = (pud); \
+ PagePinned(virt_to_page(__pudp)) \
+ ? set_pud(__pudp, xen_make_pud(0)) \
+ : (void)(*__pudp = xen_make_pud(0)); \
+})
#define __user_pgd(pgd) ((pgd) + PTRS_PER_PGD)
static inline void xen_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
- xen_l4_entry_update(pgdp, pgd);
+ xen_l4_entry_update(pgdp, 0, pgd);
}
-static inline void xen_pgd_clear(pgd_t *pgd)
-{
- xen_set_pgd(pgd, xen_make_pgd(0));
- xen_set_pgd(__user_pgd(pgd), xen_make_pgd(0));
-}
+#define xen_pgd_clear(pgd) \
+({ \
+ pgd_t *__pgdp = (pgd); \
+ PagePinned(virt_to_page(__pgdp)) \
+ ? xen_l4_entry_update(__pgdp, 1, xen_make_pgd(0)) \
+ : (void)(*__user_pgd(__pgdp) = *__pgdp = xen_make_pgd(0)); \
+})
#define __pte_mfn(_pte) (((_pte).pte & PTE_PFN_MASK) >> PAGE_SHIFT)
--- sle11sp1-2010-03-22.orig/arch/x86/mm/hypervisor.c 2009-06-09 15:52:17.000000000 +0200
+++ sle11sp1-2010-03-22/arch/x86/mm/hypervisor.c 2009-12-11 15:27:37.000000000 +0100
@@ -360,31 +360,91 @@ void xen_l1_entry_update(pte_t *ptr, pte
}
EXPORT_SYMBOL_GPL(xen_l1_entry_update);
+static void do_lN_entry_update(mmu_update_t *mmu, unsigned int mmu_count,
+ struct page *page)
+{
+ if (likely(page)) {
+ multicall_entry_t mcl[2];
+ unsigned long pfn = page_to_pfn(page);
+
+ MULTI_update_va_mapping(mcl,
+ (unsigned long)__va(pfn << PAGE_SHIFT),
+ pfn_pte(pfn, PAGE_KERNEL_RO), 0);
+ SetPagePinned(page);
+ MULTI_mmu_update(mcl + 1, mmu, mmu_count, NULL, DOMID_SELF);
+ if (unlikely(HYPERVISOR_multicall_check(mcl, 2, NULL)))
+ BUG();
+ } else if (unlikely(HYPERVISOR_mmu_update(mmu, mmu_count,
+ NULL, DOMID_SELF) < 0))
+ BUG();
+}
+
void xen_l2_entry_update(pmd_t *ptr, pmd_t val)
{
mmu_update_t u;
+ struct page *page = NULL;
+
+ if (likely(pmd_present(val)) && likely(!pmd_large(val))
+ && likely(mem_map)
+ && likely(PagePinned(virt_to_page(ptr)))) {
+ page = pmd_page(val);
+ if (unlikely(PagePinned(page)))
+ page = NULL;
+ else if (PageHighMem(page)) {
+#ifndef CONFIG_HIGHPTE
+ BUG();
+#endif
+ kmap_flush_unused();
+ page = NULL;
+ }
+ }
u.ptr = virt_to_machine(ptr);
u.val = __pmd_val(val);
- BUG_ON(HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF) < 0);
+ do_lN_entry_update(&u, 1, page);
}
#if defined(CONFIG_X86_PAE) || defined(CONFIG_X86_64)
void xen_l3_entry_update(pud_t *ptr, pud_t val)
{
mmu_update_t u;
+ struct page *page = NULL;
+
+ if (likely(pud_present(val))
+#ifdef CONFIG_X86_64
+ && likely(!pud_large(val))
+#endif
+ && likely(mem_map)
+ && likely(PagePinned(virt_to_page(ptr)))) {
+ page = pud_page(val);
+ if (unlikely(PagePinned(page)))
+ page = NULL;
+ }
u.ptr = virt_to_machine(ptr);
u.val = __pud_val(val);
- BUG_ON(HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF) < 0);
+ do_lN_entry_update(&u, 1, page);
}
#endif
#ifdef CONFIG_X86_64
-void xen_l4_entry_update(pgd_t *ptr, pgd_t val)
+void xen_l4_entry_update(pgd_t *ptr, int user, pgd_t val)
{
- mmu_update_t u;
- u.ptr = virt_to_machine(ptr);
- u.val = __pgd_val(val);
- BUG_ON(HYPERVISOR_mmu_update(&u, 1, NULL, DOMID_SELF) < 0);
+ mmu_update_t u[2];
+ struct page *page = NULL;
+
+ if (likely(pgd_present(val)) && likely(mem_map)
+ && likely(PagePinned(virt_to_page(ptr)))) {
+ page = pgd_page(val);
+ if (unlikely(PagePinned(page)))
+ page = NULL;
+ }
+ u[0].ptr = virt_to_machine(ptr);
+ u[0].val = __pgd_val(val);
+ if (user) {
+ u[1].ptr = virt_to_machine(__user_pgd(ptr));
+ u[1].val = __pgd_val(val);
+ do_lN_entry_update(u, 2, page);
+ } else
+ do_lN_entry_update(u, 1, page);
}
#endif /* CONFIG_X86_64 */
--- sle11sp1-2010-03-22.orig/arch/x86/mm/init_32-xen.c 2010-03-11 09:32:10.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/init_32-xen.c 2009-10-13 17:22:09.000000000 +0200
@@ -748,6 +748,8 @@ static void __init zone_sizes_init(void)
#endif
free_area_init_nodes(max_zone_pfns);
+
+ xen_init_pgd_pin();
}
static unsigned long __init setup_node_bootmem(int nodeid,
@@ -1018,8 +1020,6 @@ void __init mem_init(void)
save_pg_dir();
zap_low_mappings(true);
-
- SetPagePinned(virt_to_page(init_mm.pgd));
}
#ifdef CONFIG_MEMORY_HOTPLUG
--- sle11sp1-2010-03-22.orig/arch/x86/mm/init_64-xen.c 2010-03-11 09:32:17.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/init_64-xen.c 2009-11-12 17:37:05.000000000 +0100
@@ -192,8 +192,11 @@ static pud_t *fill_pud(pgd_t *pgd, unsig
{
if (pgd_none(*pgd)) {
pud_t *pud = (pud_t *)spp_getpage();
- make_page_readonly(pud, XENFEAT_writable_page_tables);
- pgd_populate(&init_mm, pgd, pud);
+ if (!after_bootmem) {
+ make_page_readonly(pud, XENFEAT_writable_page_tables);
+ xen_l4_entry_update(pgd, __pgd(__pa(pud) | _PAGE_TABLE));
+ } else
+ pgd_populate(&init_mm, pgd, pud);
if (pud != pud_offset(pgd, 0))
printk(KERN_ERR "PAGETABLE BUG #00! %p <-> %p\n",
pud, pud_offset(pgd, 0));
@@ -205,8 +208,11 @@ static pmd_t *fill_pmd(pud_t *pud, unsig
{
if (pud_none(*pud)) {
pmd_t *pmd = (pmd_t *) spp_getpage();
- make_page_readonly(pmd, XENFEAT_writable_page_tables);
- pud_populate(&init_mm, pud, pmd);
+ if (!after_bootmem) {
+ make_page_readonly(pmd, XENFEAT_writable_page_tables);
+ xen_l3_entry_update(pud, __pud(__pa(pmd) | _PAGE_TABLE));
+ } else
+ pud_populate(&init_mm, pud, pmd);
if (pmd != pmd_offset(pud, 0))
printk(KERN_ERR "PAGETABLE BUG #01! %p <-> %p\n",
pmd, pmd_offset(pud, 0));
@@ -535,7 +541,6 @@ phys_pmd_init(pmd_t *pmd_page, unsigned
XENFEAT_writable_page_tables);
*pmd = __pmd(pte_phys | _PAGE_TABLE);
} else {
- make_page_readonly(pte, XENFEAT_writable_page_tables);
spin_lock(&init_mm.page_table_lock);
pmd_populate_kernel(&init_mm, pmd, __va(pte_phys));
spin_unlock(&init_mm.page_table_lock);
@@ -624,7 +629,6 @@ phys_pud_init(pud_t *pud_page, unsigned
else
*pud = __pud(pmd_phys | _PAGE_TABLE);
} else {
- make_page_readonly(pmd, XENFEAT_writable_page_tables);
spin_lock(&init_mm.page_table_lock);
pud_populate(&init_mm, pud, __va(pmd_phys));
spin_unlock(&init_mm.page_table_lock);
@@ -798,7 +802,6 @@ kernel_physical_mapping_init(unsigned lo
XENFEAT_writable_page_tables);
xen_l4_entry_update(pgd, __pgd(pud_phys | _PAGE_TABLE));
} else {
- make_page_readonly(pud, XENFEAT_writable_page_tables);
spin_lock(&init_mm.page_table_lock);
pgd_populate(&init_mm, pgd, __va(pud_phys));
spin_unlock(&init_mm.page_table_lock);
@@ -854,7 +857,7 @@ void __init paging_init(void)
free_area_init_nodes(max_zone_pfns);
- SetPagePinned(virt_to_page(init_mm.pgd));
+ xen_init_pgd_pin();
}
/*
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pgtable-xen.c 2010-03-22 12:50:44.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pgtable-xen.c 2010-03-22 12:59:39.000000000 +0100
@@ -65,16 +65,16 @@ early_param("userpte", setup_userpte);
void __pte_free(pgtable_t pte)
{
if (!PageHighMem(pte)) {
- unsigned long va = (unsigned long)page_address(pte);
- unsigned int level;
- pte_t *ptep = lookup_address(va, &level);
-
- BUG_ON(!ptep || level != PG_LEVEL_4K || !pte_present(*ptep));
- if (!pte_write(*ptep)
- && HYPERVISOR_update_va_mapping(va,
- mk_pte(pte, PAGE_KERNEL),
- 0))
- BUG();
+ if (PagePinned(pte)) {
+ unsigned long pfn = page_to_pfn(pte);
+
+ if (HYPERVISOR_update_va_mapping((unsigned long)__va(pfn << PAGE_SHIFT),
+ pfn_pte(pfn,
+ PAGE_KERNEL),
+ 0))
+ BUG();
+ ClearPagePinned(pte);
+ }
} else
#ifdef CONFIG_HIGHPTE
ClearPagePinned(pte);
@@ -116,14 +116,15 @@ pmd_t *pmd_alloc_one(struct mm_struct *m
void __pmd_free(pgtable_t pmd)
{
- unsigned long va = (unsigned long)page_address(pmd);
- unsigned int level;
- pte_t *ptep = lookup_address(va, &level);
-
- BUG_ON(!ptep || level != PG_LEVEL_4K || !pte_present(*ptep));
- if (!pte_write(*ptep)
- && HYPERVISOR_update_va_mapping(va, mk_pte(pmd, PAGE_KERNEL), 0))
- BUG();
+ if (PagePinned(pmd)) {
+ unsigned long pfn = page_to_pfn(pmd);
+
+ if (HYPERVISOR_update_va_mapping((unsigned long)__va(pfn << PAGE_SHIFT),
+ pfn_pte(pfn, PAGE_KERNEL),
+ 0))
+ BUG();
+ ClearPagePinned(pmd);
+ }
ClearPageForeign(pmd);
init_page_count(pmd);
@@ -211,21 +212,20 @@ static inline unsigned int pgd_walk_set_
{
unsigned long pfn = page_to_pfn(page);
- if (PageHighMem(page)) {
- if (pgprot_val(flags) & _PAGE_RW)
- ClearPagePinned(page);
- else
- SetPagePinned(page);
- } else {
- MULTI_update_va_mapping(per_cpu(pb_mcl, cpu) + seq,
- (unsigned long)__va(pfn << PAGE_SHIFT),
- pfn_pte(pfn, flags), 0);
- if (unlikely(++seq == PIN_BATCH)) {
- if (unlikely(HYPERVISOR_multicall_check(per_cpu(pb_mcl, cpu),
- PIN_BATCH, NULL)))
- BUG();
- seq = 0;
- }
+ if (pgprot_val(flags) & _PAGE_RW)
+ ClearPagePinned(page);
+ else
+ SetPagePinned(page);
+ if (PageHighMem(page))
+ return seq;
+ MULTI_update_va_mapping(per_cpu(pb_mcl, cpu) + seq,
+ (unsigned long)__va(pfn << PAGE_SHIFT),
+ pfn_pte(pfn, flags), 0);
+ if (unlikely(++seq == PIN_BATCH)) {
+ if (unlikely(HYPERVISOR_multicall_check(per_cpu(pb_mcl, cpu),
+ PIN_BATCH, NULL)))
+ BUG();
+ seq = 0;
}
return seq;
@@ -272,6 +272,16 @@ static void pgd_walk(pgd_t *pgd_base, pg
}
}
+#ifdef CONFIG_X86_PAE
+ for (; g < PTRS_PER_PGD; g++, pgd++) {
+ BUG_ON(pgd_none(*pgd));
+ pud = pud_offset(pgd, 0);
+ BUG_ON(pud_none(*pud));
+ pmd = pmd_offset(pud, 0);
+ seq = pgd_walk_set_prot(virt_to_page(pmd),flags,cpu,seq);
+ }
+#endif
+
mcl = per_cpu(pb_mcl, cpu);
#ifdef CONFIG_X86_64
if (unlikely(seq > PIN_BATCH - 2)) {
@@ -307,6 +317,51 @@ static void pgd_walk(pgd_t *pgd_base, pg
put_cpu();
}
+void __init xen_init_pgd_pin(void)
+{
+ pgd_t *pgd = init_mm.pgd;
+ pud_t *pud;
+ pmd_t *pmd;
+ unsigned int g, u, m;
+
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ return;
+
+ SetPagePinned(virt_to_page(pgd));
+ for (g = 0; g < PTRS_PER_PGD; g++, pgd++) {
+#ifndef CONFIG_X86_PAE
+ if (g >= pgd_index(HYPERVISOR_VIRT_START)
+ && g <= pgd_index(HYPERVISOR_VIRT_END - 1))
+ continue;
+#endif
+ if (!pgd_present(*pgd))
+ continue;
+ pud = pud_offset(pgd, 0);
+ if (PTRS_PER_PUD > 1) /* not folded */
+ SetPagePinned(virt_to_page(pud));
+ for (u = 0; u < PTRS_PER_PUD; u++, pud++) {
+ if (!pud_present(*pud))
+ continue;
+ pmd = pmd_offset(pud, 0);
+ if (PTRS_PER_PMD > 1) /* not folded */
+ SetPagePinned(virt_to_page(pmd));
+ for (m = 0; m < PTRS_PER_PMD; m++, pmd++) {
+#ifdef CONFIG_X86_PAE
+ if (g == pgd_index(HYPERVISOR_VIRT_START)
+ && m >= pmd_index(HYPERVISOR_VIRT_START))
+ continue;
+#endif
+ if (!pmd_present(*pmd))
+ continue;
+ SetPagePinned(pmd_page(*pmd));
+ }
+ }
+ }
+#ifdef CONFIG_X86_64
+ SetPagePinned(virt_to_page(level3_user_pgt));
+#endif
+}
+
static void __pgd_pin(pgd_t *pgd)
{
pgd_walk(pgd, PAGE_KERNEL_RO);
@@ -497,21 +552,18 @@ static void pgd_dtor(pgd_t *pgd)
void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
{
- struct page *page = virt_to_page(pmd);
- unsigned long pfn = page_to_pfn(page);
-
- paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
-
/* Note: almost everything apart from _PAGE_PRESENT is
reserved at the pmd (PDPT) level. */
- if (PagePinned(virt_to_page(mm->pgd))) {
- BUG_ON(PageHighMem(page));
- BUG_ON(HYPERVISOR_update_va_mapping(
- (unsigned long)__va(pfn << PAGE_SHIFT),
- pfn_pte(pfn, PAGE_KERNEL_RO), 0));
- set_pud(pudp, __pud(__pa(pmd) | _PAGE_PRESENT));
- } else
- *pudp = __pud(__pa(pmd) | _PAGE_PRESENT);
+ pud_t pud = __pud(__pa(pmd) | _PAGE_PRESENT);
+
+ paravirt_alloc_pmd(mm, page_to_pfn(virt_to_page(pmd)));
+
+ if (likely(!PagePinned(virt_to_page(pudp)))) {
+ *pudp = pud;
+ return;
+ }
+
+ set_pud(pudp, pud);
/*
* According to Intel App note "TLBs, Paging-Structure Caches,
@@ -606,13 +658,10 @@ static void pgd_prepopulate_pmd(struct m
i++, pud++, addr += PUD_SIZE) {
pmd_t *pmd = pmds[i];
- if (i >= KERNEL_PGD_BOUNDARY) {
+ if (i >= KERNEL_PGD_BOUNDARY)
memcpy(pmd,
(pmd_t *)pgd_page_vaddr(swapper_pg_dir[i]),
sizeof(pmd_t) * PTRS_PER_PMD);
- make_lowmem_page_readonly(
- pmd, XENFEAT_writable_page_tables);
- }
/* It is safe to poke machine addresses of pmds under the pgd_lock. */
pud_populate(mm, pud, pmd);

186
xen-x86-time-per-cpu Normal file
View file

@ -0,0 +1,186 @@
From: jbeulich@novell.com
Subject: fold per-CPU accounting data into a structure
Patch-mainline: n/a
... to simplify generated code, especially in timer_interrupt(). This
becomes more important with more such data elements added (i.e. by
patches.xen/xen-x86-xtime-lock).
--- sle11sp1-2010-02-17.orig/arch/x86/kernel/time-xen.c 2010-02-18 17:30:48.000000000 +0100
+++ sle11sp1-2010-02-17/arch/x86/kernel/time-xen.c 2010-02-18 17:32:00.000000000 +0100
@@ -57,12 +57,15 @@ static u32 shadow_tv_version;
/* Keep track of last time we did processing/updating of jiffies and xtime. */
static u64 processed_system_time; /* System time (ns) at last processing. */
-static DEFINE_PER_CPU(u64, processed_system_time);
-static DEFINE_PER_CPU(u64, accounted_system_time);
-/* How much CPU time was spent blocked and how much was 'stolen'? */
-static DEFINE_PER_CPU(u64, processed_stolen_time);
-static DEFINE_PER_CPU(u64, processed_blocked_time);
+struct local_time_info {
+ u64 processed_system;
+ u64 accounted_system;
+ /* How much CPU time was spent blocked and how much was 'stolen'? */
+ u64 accounted_stolen;
+ u64 accounted_blocked;
+};
+static DEFINE_PER_CPU(struct local_time_info, local_time);
/* Current runstate of each CPU (updated automatically by the hypervisor). */
DEFINE_PER_CPU(struct vcpu_runstate_info, runstate);
@@ -440,6 +443,7 @@ static irqreturn_t timer_interrupt(int i
s64 delta, delta_cpu, stolen, blocked;
unsigned int i, cpu = smp_processor_id();
struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
+ struct local_time_info *local = &per_cpu(local_time, cpu);
bool duty = false;
struct vcpu_runstate_info runstate;
@@ -468,7 +472,7 @@ static irqreturn_t timer_interrupt(int i
delta = delta_cpu =
shadow->system_timestamp + get_nsec_offset(shadow);
delta -= processed_system_time;
- delta_cpu -= per_cpu(processed_system_time, cpu);
+ delta_cpu -= local->processed_system;
get_runstate_snapshot(&runstate);
} while (!time_values_up_to_date());
@@ -482,10 +486,10 @@ static irqreturn_t timer_interrupt(int i
"processed=%Lx/%Lx\n",
cpu, delta, delta_cpu, shadow->system_timestamp,
get_nsec_offset(shadow), blocked,
- per_cpu(processed_system_time, cpu));
+ local->processed_system);
for_each_cpu_and(i, cpu_online_mask, cpumask_of(cpu))
printk(" %u: %Lx\n", i,
- per_cpu(processed_system_time, i));
+ per_cpu(local_time.processed_system, i));
}
} else if (unlikely(delta_cpu < -(s64)permitted_clock_jitter)) {
blocked = processed_system_time;
@@ -496,10 +500,10 @@ static irqreturn_t timer_interrupt(int i
" shadow=%Lx off=%Lx processed=%Lx/%Lx\n",
cpu, delta_cpu, shadow->system_timestamp,
get_nsec_offset(shadow), blocked,
- per_cpu(processed_system_time, cpu));
+ local->processed_system);
for_each_cpu_and(i, cpu_online_mask, cpumask_of(cpu))
printk(" %u: %Lx\n", i,
- per_cpu(processed_system_time, i));
+ per_cpu(local_time.processed_system, i));
}
} else if (duty) {
/* System-wide jiffy work. */
@@ -524,11 +528,10 @@ static irqreturn_t timer_interrupt(int i
}
delta = delta_cpu;
- delta_cpu += per_cpu(processed_system_time, cpu)
- - per_cpu(accounted_system_time, cpu);
+ delta_cpu += local->processed_system - local->accounted_system;
if (delta >= NS_PER_TICK) {
do_div(delta, NS_PER_TICK);
- per_cpu(processed_system_time, cpu) += delta * NS_PER_TICK;
+ local->processed_system += delta * NS_PER_TICK;
}
/*
@@ -537,14 +540,14 @@ static irqreturn_t timer_interrupt(int i
*/
stolen = runstate.time[RUNSTATE_runnable]
+ runstate.time[RUNSTATE_offline]
- - per_cpu(processed_stolen_time, cpu);
+ - local->accounted_stolen;
if ((stolen > 0) && (delta_cpu > 0)) {
delta_cpu -= stolen;
if (unlikely(delta_cpu < 0))
stolen += delta_cpu; /* clamp local-time progress */
do_div(stolen, NS_PER_TICK);
- per_cpu(processed_stolen_time, cpu) += stolen * NS_PER_TICK;
- per_cpu(accounted_system_time, cpu) += stolen * NS_PER_TICK;
+ local->accounted_stolen += stolen * NS_PER_TICK;
+ local->accounted_system += stolen * NS_PER_TICK;
account_steal_ticks(stolen);
}
@@ -553,21 +556,21 @@ static irqreturn_t timer_interrupt(int i
* ensures that the ticks are accounted as idle/wait.
*/
blocked = runstate.time[RUNSTATE_blocked]
- - per_cpu(processed_blocked_time, cpu);
+ - local->accounted_blocked;
if ((blocked > 0) && (delta_cpu > 0)) {
delta_cpu -= blocked;
if (unlikely(delta_cpu < 0))
blocked += delta_cpu; /* clamp local-time progress */
do_div(blocked, NS_PER_TICK);
- per_cpu(processed_blocked_time, cpu) += blocked * NS_PER_TICK;
- per_cpu(accounted_system_time, cpu) += blocked * NS_PER_TICK;
+ local->accounted_blocked += blocked * NS_PER_TICK;
+ local->accounted_system += blocked * NS_PER_TICK;
account_idle_ticks(blocked);
}
/* Account user/system ticks. */
if (delta_cpu > 0) {
do_div(delta_cpu, NS_PER_TICK);
- per_cpu(accounted_system_time, cpu) += delta_cpu * NS_PER_TICK;
+ local->accounted_system += delta_cpu * NS_PER_TICK;
if (user_mode_vm(get_irq_regs()))
account_user_time(current, (cputime_t)delta_cpu,
(cputime_t)delta_cpu);
@@ -606,9 +609,9 @@ static void init_missing_ticks_accountin
{
struct vcpu_runstate_info *runstate = setup_runstate_area(cpu);
- per_cpu(processed_blocked_time, cpu) =
+ per_cpu(local_time.accounted_blocked, cpu) =
runstate->time[RUNSTATE_blocked];
- per_cpu(processed_stolen_time, cpu) =
+ per_cpu(local_time.accounted_stolen, cpu) =
runstate->time[RUNSTATE_runnable] +
runstate->time[RUNSTATE_offline];
}
@@ -668,8 +671,8 @@ static void xen_clocksource_resume(void)
BUG();
}
get_time_values_from_xen(cpu);
- per_cpu(accounted_system_time, cpu) =
- per_cpu(processed_system_time, cpu) =
+ per_cpu(local_time.accounted_system, cpu) =
+ per_cpu(local_time.processed_system, cpu) =
per_cpu(shadow_time, 0).system_timestamp;
init_missing_ticks_accounting(cpu);
}
@@ -770,8 +773,8 @@ void __init time_init(void)
get_time_values_from_xen(0);
processed_system_time = per_cpu(shadow_time, 0).system_timestamp;
- per_cpu(processed_system_time, 0) = processed_system_time;
- per_cpu(accounted_system_time, 0) = processed_system_time;
+ per_cpu(local_time.processed_system, 0) = processed_system_time;
+ per_cpu(local_time.accounted_system, 0) = processed_system_time;
init_missing_ticks_accounting(0);
clocksource_register(&clocksource_xen);
@@ -849,7 +852,7 @@ static void stop_hz_timer(void)
singleshot.timeout_abs_ns = jiffies_to_st(j);
if (!singleshot.timeout_abs_ns)
return;
- local = per_cpu(processed_system_time, cpu);
+ local = per_cpu(local_time.processed_system, cpu);
if ((s64)(singleshot.timeout_abs_ns - local) <= NS_PER_TICK) {
cpumask_clear_cpu(cpu, nohz_cpu_mask);
singleshot.timeout_abs_ns = local + NS_PER_TICK;
@@ -918,8 +921,8 @@ int __cpuinit local_setup_timer(unsigned
do {
seq = read_seqbegin(&xtime_lock);
/* Use cpu0 timestamp: cpu's shadow is not initialised yet. */
- per_cpu(accounted_system_time, cpu) =
- per_cpu(processed_system_time, cpu) =
+ per_cpu(local_time.accounted_system, cpu) =
+ per_cpu(local_time.processed_system, cpu) =
per_cpu(shadow_time, 0).system_timestamp;
init_missing_ticks_accounting(cpu);
} while (read_seqretry(&xtime_lock, seq));

245
xen-x86-xtime-lock Normal file
View file

@ -0,0 +1,245 @@
From: jbeulich@novell.com
Subject: reduce contention on xtime_lock
Patch-mainline: n/a
References: bnc#569014, bnc#571041, bnc#571769, bnc#572146
Especially on large systems the number of CPUs queueing up on
xtime_lock may become signficiant, and (as reported in the bugs above)
may even prevent proper operation of the system when Xen is using
deep C-states. There is, however, no need for all CPUs in the system
to update global time - it is sufficient to have a single (at any given
point in time) CPU being responsible for this.
Also, while touching that code, avoid calling printk() with xtime_lock
held.
--- sle11sp1-2010-02-17.orig/arch/x86/kernel/time-xen.c 2010-02-18 17:30:18.000000000 +0100
+++ sle11sp1-2010-02-17/arch/x86/kernel/time-xen.c 2010-02-18 17:33:07.000000000 +0100
@@ -58,6 +58,7 @@ static u32 shadow_tv_version;
/* Keep track of last time we did processing/updating of jiffies and xtime. */
static u64 processed_system_time; /* System time (ns) at last processing. */
static DEFINE_PER_CPU(u64, processed_system_time);
+static DEFINE_PER_CPU(u64, accounted_system_time);
/* How much CPU time was spent blocked and how much was 'stolen'? */
static DEFINE_PER_CPU(u64, processed_stolen_time);
@@ -123,6 +124,19 @@ static int __init __permitted_clock_jitt
__setup("permitted_clock_jitter=", __permitted_clock_jitter);
/*
+ * Limit on the number of CPUs that may concurrently attempt to acquire
+ * xtime_lock in timer_interrupt() (reducing contention potentially leading
+ * to a live lock on systems with many CPUs.
+ */
+static unsigned int __read_mostly duty_limit = -2;
+static int __init set_duty_limit(char *str)
+{
+ duty_limit = simple_strtoul(str, NULL, 0) - 1;
+ return 1;
+}
+__setup("timer_duty_limit=", set_duty_limit);
+
+/*
* Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
* yielding a 64-bit result.
*/
@@ -422,9 +436,11 @@ EXPORT_SYMBOL(profile_pc);
*/
static irqreturn_t timer_interrupt(int irq, void *dev_id)
{
+ static unsigned int contention_count;
s64 delta, delta_cpu, stolen, blocked;
unsigned int i, cpu = smp_processor_id();
struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
+ bool duty = false;
struct vcpu_runstate_info runstate;
/* Keep nmi watchdog up to date */
@@ -437,7 +453,13 @@ static irqreturn_t timer_interrupt(int i
* the irq version of write_lock because as just said we have irq
* locally disabled. -arca
*/
- write_seqlock(&xtime_lock);
+ asm (LOCK_PREFIX "xaddl %1, %0"
+ : "+m" (contention_count), "=r" (i) : "1" (1));
+ if (i <= duty_limit) {
+ write_seqlock(&xtime_lock);
+ duty = true;
+ }
+ asm (LOCK_PREFIX "decl %0" : "+m" (contention_count));
do {
get_time_values_from_xen(cpu);
@@ -451,40 +473,63 @@ static irqreturn_t timer_interrupt(int i
get_runstate_snapshot(&runstate);
} while (!time_values_up_to_date());
- if ((unlikely(delta < -(s64)permitted_clock_jitter) ||
- unlikely(delta_cpu < -(s64)permitted_clock_jitter))
- && printk_ratelimit()) {
- printk("Timer ISR/%u: Time went backwards: "
- "delta=%lld delta_cpu=%lld shadow=%lld "
- "off=%lld processed=%lld cpu_processed=%lld\n",
- cpu, delta, delta_cpu, shadow->system_timestamp,
- (s64)get_nsec_offset(shadow),
- processed_system_time,
- per_cpu(processed_system_time, cpu));
- for (i = 0; i < num_online_cpus(); i++)
- printk(" %d: %lld\n", i,
- per_cpu(processed_system_time, i));
- }
+ if (duty && unlikely(delta < -(s64)permitted_clock_jitter)) {
+ blocked = processed_system_time;
+ write_sequnlock(&xtime_lock);
+ if (printk_ratelimit()) {
+ printk("Timer ISR/%u: Time went backwards: "
+ "delta=%Ld/%Ld shadow=%Lx off=%Lx "
+ "processed=%Lx/%Lx\n",
+ cpu, delta, delta_cpu, shadow->system_timestamp,
+ get_nsec_offset(shadow), blocked,
+ per_cpu(processed_system_time, cpu));
+ for_each_cpu_and(i, cpu_online_mask, cpumask_of(cpu))
+ printk(" %u: %Lx\n", i,
+ per_cpu(processed_system_time, i));
+ }
+ } else if (unlikely(delta_cpu < -(s64)permitted_clock_jitter)) {
+ blocked = processed_system_time;
+ if (duty)
+ write_sequnlock(&xtime_lock);
+ if (printk_ratelimit()) {
+ printk("Timer ISR/%u: Time went backwards: delta=%Ld"
+ " shadow=%Lx off=%Lx processed=%Lx/%Lx\n",
+ cpu, delta_cpu, shadow->system_timestamp,
+ get_nsec_offset(shadow), blocked,
+ per_cpu(processed_system_time, cpu));
+ for_each_cpu_and(i, cpu_online_mask, cpumask_of(cpu))
+ printk(" %u: %Lx\n", i,
+ per_cpu(processed_system_time, i));
+ }
+ } else if (duty) {
+ /* System-wide jiffy work. */
+ if (delta >= NS_PER_TICK) {
+ do_div(delta, NS_PER_TICK);
+ processed_system_time += delta * NS_PER_TICK;
+ while (delta > HZ) {
+ clobber_induction_variable(delta);
+ do_timer(HZ);
+ delta -= HZ;
+ }
+ do_timer(delta);
+ }
- /* System-wide jiffy work. */
- if (delta >= NS_PER_TICK) {
- do_div(delta, NS_PER_TICK);
- processed_system_time += delta * NS_PER_TICK;
- while (delta > HZ) {
- clobber_induction_variable(delta);
- do_timer(HZ);
- delta -= HZ;
+ if (shadow_tv_version != HYPERVISOR_shared_info->wc_version) {
+ update_wallclock();
+ if (keventd_up())
+ schedule_work(&clock_was_set_work);
}
- do_timer(delta);
- }
- if (shadow_tv_version != HYPERVISOR_shared_info->wc_version) {
- update_wallclock();
- if (keventd_up())
- schedule_work(&clock_was_set_work);
+ write_sequnlock(&xtime_lock);
}
- write_sequnlock(&xtime_lock);
+ delta = delta_cpu;
+ delta_cpu += per_cpu(processed_system_time, cpu)
+ - per_cpu(accounted_system_time, cpu);
+ if (delta >= NS_PER_TICK) {
+ do_div(delta, NS_PER_TICK);
+ per_cpu(processed_system_time, cpu) += delta * NS_PER_TICK;
+ }
/*
* Account stolen ticks.
@@ -499,7 +544,7 @@ static irqreturn_t timer_interrupt(int i
stolen += delta_cpu; /* clamp local-time progress */
do_div(stolen, NS_PER_TICK);
per_cpu(processed_stolen_time, cpu) += stolen * NS_PER_TICK;
- per_cpu(processed_system_time, cpu) += stolen * NS_PER_TICK;
+ per_cpu(accounted_system_time, cpu) += stolen * NS_PER_TICK;
account_steal_ticks(stolen);
}
@@ -515,14 +560,14 @@ static irqreturn_t timer_interrupt(int i
blocked += delta_cpu; /* clamp local-time progress */
do_div(blocked, NS_PER_TICK);
per_cpu(processed_blocked_time, cpu) += blocked * NS_PER_TICK;
- per_cpu(processed_system_time, cpu) += blocked * NS_PER_TICK;
+ per_cpu(accounted_system_time, cpu) += blocked * NS_PER_TICK;
account_idle_ticks(blocked);
}
/* Account user/system ticks. */
if (delta_cpu > 0) {
do_div(delta_cpu, NS_PER_TICK);
- per_cpu(processed_system_time, cpu) += delta_cpu * NS_PER_TICK;
+ per_cpu(accounted_system_time, cpu) += delta_cpu * NS_PER_TICK;
if (user_mode_vm(get_irq_regs()))
account_user_time(current, (cputime_t)delta_cpu,
(cputime_t)delta_cpu);
@@ -623,6 +668,7 @@ static void xen_clocksource_resume(void)
BUG();
}
get_time_values_from_xen(cpu);
+ per_cpu(accounted_system_time, cpu) =
per_cpu(processed_system_time, cpu) =
per_cpu(shadow_time, 0).system_timestamp;
init_missing_ticks_accounting(cpu);
@@ -725,6 +771,7 @@ void __init time_init(void)
processed_system_time = per_cpu(shadow_time, 0).system_timestamp;
per_cpu(processed_system_time, 0) = processed_system_time;
+ per_cpu(accounted_system_time, 0) = processed_system_time;
init_missing_ticks_accounting(0);
clocksource_register(&clocksource_xen);
@@ -735,6 +782,9 @@ void __init time_init(void)
/* Cannot request_irq() until kmem is initialised. */
late_time_init = setup_cpu0_timer_irq;
+
+ if (!(duty_limit + 2))
+ duty_limit = __fls(nr_cpu_ids);
}
/* Convert jiffies to system time. */
@@ -773,6 +823,7 @@ static void stop_hz_timer(void)
struct vcpu_set_singleshot_timer singleshot;
unsigned int cpu = smp_processor_id();
unsigned long j;
+ u64 local;
int rc;
cpumask_set_cpu(cpu, nohz_cpu_mask);
@@ -798,6 +849,11 @@ static void stop_hz_timer(void)
singleshot.timeout_abs_ns = jiffies_to_st(j);
if (!singleshot.timeout_abs_ns)
return;
+ local = per_cpu(processed_system_time, cpu);
+ if ((s64)(singleshot.timeout_abs_ns - local) <= NS_PER_TICK) {
+ cpumask_clear_cpu(cpu, nohz_cpu_mask);
+ singleshot.timeout_abs_ns = local + NS_PER_TICK;
+ }
singleshot.timeout_abs_ns += NS_PER_TICK / 2;
singleshot.flags = 0;
rc = HYPERVISOR_vcpu_op(VCPUOP_set_singleshot_timer, cpu, &singleshot);
@@ -862,6 +918,7 @@ int __cpuinit local_setup_timer(unsigned
do {
seq = read_seqbegin(&xtime_lock);
/* Use cpu0 timestamp: cpu's shadow is not initialised yet. */
+ per_cpu(accounted_system_time, cpu) =
per_cpu(processed_system_time, cpu) =
per_cpu(shadow_time, 0).system_timestamp;
init_missing_ticks_accounting(cpu);

51
xen-x86_64-dump-user-pgt Normal file
View file

@ -0,0 +1,51 @@
From: jbeulich@novell.com
Subject: dump the correct page tables for user mode faults
Patch-mainline: obsolete
--- head-2009-10-12.orig/arch/x86/mm/fault-xen.c 2009-10-13 13:40:11.000000000 +0200
+++ head-2009-10-12/arch/x86/mm/fault-xen.c 2009-10-13 17:28:26.000000000 +0200
@@ -328,6 +328,7 @@ static void dump_pagetable(unsigned long
out:
printk(KERN_CONT "\n");
}
+#define dump_pagetable(addr, krnl) dump_pagetable(addr)
#else /* CONFIG_X86_64: */
@@ -452,7 +453,7 @@ static int bad_address(void *p)
return probe_kernel_address((unsigned long *)p, dummy);
}
-static void dump_pagetable(unsigned long address)
+static void dump_pagetable(unsigned long address, bool kernel)
{
pgd_t *base = __va(read_cr3() & PHYSICAL_PAGE_MASK);
pgd_t *pgd = base + pgd_index(address);
@@ -460,6 +461,9 @@ static void dump_pagetable(unsigned long
pmd_t *pmd;
pte_t *pte;
+ if (!kernel)
+ pgd = __user_pgd(base) + pgd_index(address);
+
if (bad_address(pgd))
goto bad;
@@ -598,7 +602,7 @@ show_fault_oops(struct pt_regs *regs, un
printk(KERN_ALERT "IP:");
printk_address(regs->ip, 1);
- dump_pagetable(address);
+ dump_pagetable(address, !(error_code & PF_USER));
}
static noinline void
@@ -615,7 +619,7 @@ pgtable_bad(struct pt_regs *regs, unsign
printk(KERN_ALERT "%s: Corrupted page table at address %lx\n",
tsk->comm, address);
- dump_pagetable(address);
+ dump_pagetable(address, !(error_code & PF_USER));
tsk->thread.cr2 = address;
tsk->thread.trap_no = 14;

343
xen-x86_64-note-init-p2m Normal file
View file

@ -0,0 +1,343 @@
From: jbeulich@novell.com
Subject: eliminate scalability issues from initial mapping setup
Patch-mainline: obsolete
References: bnc#417417
Direct Xen to place the initial P->M table outside of the initial
mapping, as otherwise the 1G (implementation) / 2G (theoretical)
restriction on the size of the initial mapping limits the amount
of memory a domain can be handed initially.
Note that the flags passed to HYPERVISOR_update_va_mapping() from
__make_page_writable() and make_lowmem_page_writable() are
intentionally not including UVMF_ALL. This is intended to be on optimal
choice between the overhead of a potential spurious page fault (as
remote CPUs may still have read-only translations in their TLBs) and
the overhead of cross processor flushes. Flushing on the local CPU
shouldn't be as expensive (and hence can be viewed as an optimization
avoiding the spurious page fault on the local CPU), but is required
when the functions are used before the page fault handler gets set up.
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/head64-xen.c 2009-11-06 10:52:22.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/head64-xen.c 2009-12-04 12:12:10.000000000 +0100
@@ -123,6 +123,14 @@ void __init x86_64_start_reservations(ch
reserve_early(__pa_symbol(&_text), __pa_symbol(&__bss_stop), "TEXT DATA BSS");
+ if (xen_feature(XENFEAT_auto_translated_physmap))
+ xen_start_info->mfn_list = ~0UL;
+ else if (xen_start_info->mfn_list < __START_KERNEL_map)
+ reserve_early(xen_start_info->first_p2m_pfn << PAGE_SHIFT,
+ (xen_start_info->first_p2m_pfn
+ + xen_start_info->nr_p2m_frames) << PAGE_SHIFT,
+ "INITP2M");
+
/*
* At this point everything still needed from the boot loader
* or BIOS or kernel text should be early reserved or marked not
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/head_64-xen.S 2009-12-04 14:37:53.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/head_64-xen.S 2009-12-04 14:38:07.000000000 +0100
@@ -17,6 +17,7 @@
#include <linux/elfnote.h>
#include <asm/segment.h>
#include <asm/page.h>
+#include <asm/pgtable.h>
#include <asm/msr.h>
#include <asm/cache.h>
#include <asm/dwarf2.h>
@@ -146,6 +147,7 @@ ENTRY(empty_zero_page)
ELFNOTE(Xen, XEN_ELFNOTE_ENTRY, .quad startup_64)
ELFNOTE(Xen, XEN_ELFNOTE_HYPERCALL_PAGE, .quad hypercall_page)
ELFNOTE(Xen, XEN_ELFNOTE_L1_MFN_VALID, .quad _PAGE_PRESENT, _PAGE_PRESENT)
+ ELFNOTE(Xen, XEN_ELFNOTE_INIT_P2M, .quad VMEMMAP_START)
ELFNOTE(Xen, XEN_ELFNOTE_FEATURES, .asciz "writable_page_tables|writable_descriptor_tables|auto_translated_physmap|pae_pgdir_above_4gb|supervisor_mode_kernel")
ELFNOTE(Xen, XEN_ELFNOTE_LOADER, .asciz "generic")
ELFNOTE(Xen, XEN_ELFNOTE_SUSPEND_CANCEL, .long 1)
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/setup-xen.c 2010-02-09 17:19:48.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/setup-xen.c 2010-02-10 16:12:19.000000000 +0100
@@ -1138,7 +1138,7 @@ void __init setup_arch(char **cmdline_p)
difference = xen_start_info->nr_pages - max_pfn;
set_xen_guest_handle(reservation.extent_start,
- ((unsigned long *)xen_start_info->mfn_list) + max_pfn);
+ phys_to_machine_mapping + max_pfn);
reservation.nr_extents = difference;
ret = HYPERVISOR_memory_op(XENMEM_decrease_reservation,
&reservation);
@@ -1155,14 +1155,86 @@ void __init setup_arch(char **cmdline_p)
phys_to_machine_mapping = alloc_bootmem_pages(
max_pfn * sizeof(unsigned long));
memcpy(phys_to_machine_mapping,
- (unsigned long *)xen_start_info->mfn_list,
+ __va(__pa(xen_start_info->mfn_list)),
p2m_pages * sizeof(unsigned long));
memset(phys_to_machine_mapping + p2m_pages, ~0,
(max_pfn - p2m_pages) * sizeof(unsigned long));
- free_bootmem(
- __pa(xen_start_info->mfn_list),
- PFN_PHYS(PFN_UP(xen_start_info->nr_pages *
- sizeof(unsigned long))));
+
+#ifdef CONFIG_X86_64
+ if (xen_start_info->mfn_list == VMEMMAP_START) {
+ /*
+ * Since it is well isolated we can (and since it is
+ * perhaps large we should) also free the page tables
+ * mapping the initial P->M table.
+ */
+ unsigned long va = VMEMMAP_START, pa;
+ pgd_t *pgd = pgd_offset_k(va);
+ pud_t *pud_page = pud_offset(pgd, 0);
+
+ BUILD_BUG_ON(VMEMMAP_START & ~PGDIR_MASK);
+ xen_l4_entry_update(pgd, __pgd(0));
+ for(;;) {
+ pud_t *pud = pud_page + pud_index(va);
+
+ if (pud_none(*pud))
+ va += PUD_SIZE;
+ else if (pud_large(*pud)) {
+ pa = pud_val(*pud) & PHYSICAL_PAGE_MASK;
+ make_pages_writable(__va(pa),
+ PUD_SIZE >> PAGE_SHIFT,
+ XENFEAT_writable_page_tables);
+ free_bootmem(pa, PUD_SIZE);
+ va += PUD_SIZE;
+ } else {
+ pmd_t *pmd = pmd_offset(pud, va);
+
+ if (pmd_large(*pmd)) {
+ pa = pmd_val(*pmd) & PHYSICAL_PAGE_MASK;
+ make_pages_writable(__va(pa),
+ PMD_SIZE >> PAGE_SHIFT,
+ XENFEAT_writable_page_tables);
+ free_bootmem(pa, PMD_SIZE);
+ } else if (!pmd_none(*pmd)) {
+ pte_t *pte = pte_offset_kernel(pmd, va);
+
+ for (i = 0; i < PTRS_PER_PTE; ++i) {
+ if (pte_none(pte[i]))
+ break;
+ pa = pte_pfn(pte[i]) << PAGE_SHIFT;
+ make_page_writable(__va(pa),
+ XENFEAT_writable_page_tables);
+ free_bootmem(pa, PAGE_SIZE);
+ }
+ ClearPagePinned(virt_to_page(pte));
+ make_page_writable(pte,
+ XENFEAT_writable_page_tables);
+ free_bootmem(__pa(pte), PAGE_SIZE);
+ }
+ va += PMD_SIZE;
+ if (pmd_index(va))
+ continue;
+ ClearPagePinned(virt_to_page(pmd));
+ make_page_writable(pmd,
+ XENFEAT_writable_page_tables);
+ free_bootmem(__pa((unsigned long)pmd
+ & PAGE_MASK),
+ PAGE_SIZE);
+ }
+ if (!pud_index(va))
+ break;
+ }
+ ClearPagePinned(virt_to_page(pud_page));
+ make_page_writable(pud_page,
+ XENFEAT_writable_page_tables);
+ free_bootmem(__pa((unsigned long)pud_page & PAGE_MASK),
+ PAGE_SIZE);
+ } else if (!WARN_ON(xen_start_info->mfn_list
+ < __START_KERNEL_map))
+#endif
+ free_bootmem(__pa(xen_start_info->mfn_list),
+ PFN_PHYS(PFN_UP(xen_start_info->nr_pages *
+ sizeof(unsigned long))));
+
/*
* Initialise the list of the frames that specify the list of
--- sle11sp1-2010-03-22.orig/arch/x86/mm/init-xen.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/init-xen.c 2009-12-04 12:12:10.000000000 +0100
@@ -347,9 +347,22 @@ unsigned long __init_refok init_memory_m
__flush_tlb_all();
- if (!after_bootmem && e820_table_top > e820_table_start)
+ if (!after_bootmem && e820_table_top > e820_table_start) {
+#ifdef CONFIG_X86_64
+ if (xen_start_info->mfn_list < __START_KERNEL_map
+ && e820_table_start <= xen_start_info->first_p2m_pfn
+ && e820_table_top > xen_start_info->first_p2m_pfn) {
+ reserve_early(e820_table_start << PAGE_SHIFT,
+ xen_start_info->first_p2m_pfn
+ << PAGE_SHIFT,
+ "PGTABLE");
+ e820_table_start = xen_start_info->first_p2m_pfn
+ + xen_start_info->nr_p2m_frames;
+ }
+#endif
reserve_early(e820_table_start << PAGE_SHIFT,
e820_table_top << PAGE_SHIFT, "PGTABLE");
+ }
if (!after_bootmem)
early_memtest(start, end);
--- sle11sp1-2010-03-22.orig/arch/x86/mm/init_64-xen.c 2009-12-04 12:11:43.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/init_64-xen.c 2009-12-04 12:12:10.000000000 +0100
@@ -181,6 +181,17 @@ static int __init nonx32_setup(char *str
}
__setup("noexec32=", nonx32_setup);
+static __init unsigned long get_table_end(void)
+{
+ BUG_ON(!e820_table_end);
+ if (xen_start_info->mfn_list < __START_KERNEL_map
+ && e820_table_end == xen_start_info->first_p2m_pfn) {
+ e820_table_end += xen_start_info->nr_p2m_frames;
+ e820_table_top += xen_start_info->nr_p2m_frames;
+ }
+ return e820_table_end++;
+}
+
/*
* NOTE: This function is marked __ref because it calls __init function
* (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.
@@ -192,8 +203,7 @@ static __ref void *spp_getpage(void)
if (after_bootmem)
ptr = (void *) get_zeroed_page(GFP_ATOMIC | __GFP_NOTRACK);
else if (e820_table_end < e820_table_top) {
- ptr = __va(e820_table_end << PAGE_SHIFT);
- e820_table_end++;
+ ptr = __va(get_table_end() << PAGE_SHIFT);
memset(ptr, 0, PAGE_SIZE);
} else
ptr = alloc_bootmem_pages(PAGE_SIZE);
@@ -388,8 +398,7 @@ static __ref void *alloc_low_page(unsign
return adr;
}
- BUG_ON(!e820_table_end);
- pfn = e820_table_end++;
+ pfn = get_table_end();
if (pfn >= e820_table_top)
panic("alloc_low_page: ran out of memory");
@@ -415,14 +424,29 @@ static inline int __meminit make_readonl
/* Make new page tables read-only on the first pass. */
if (!xen_feature(XENFEAT_writable_page_tables)
&& !max_pfn_mapped
- && (paddr >= (e820_table_start << PAGE_SHIFT))
- && (paddr < (e820_table_top << PAGE_SHIFT)))
- readonly = 1;
+ && (paddr >= (e820_table_start << PAGE_SHIFT))) {
+ unsigned long top = e820_table_top;
+
+ /* Account for the range get_table_end() skips. */
+ if (xen_start_info->mfn_list < __START_KERNEL_map
+ && e820_table_end <= xen_start_info->first_p2m_pfn
+ && top > xen_start_info->first_p2m_pfn)
+ top += xen_start_info->nr_p2m_frames;
+ if (paddr < (top << PAGE_SHIFT))
+ readonly = 1;
+ }
/* Make old page tables read-only. */
if (!xen_feature(XENFEAT_writable_page_tables)
&& (paddr >= (xen_start_info->pt_base - __START_KERNEL_map))
&& (paddr < (e820_table_end << PAGE_SHIFT)))
readonly = 1;
+ /* Make P->M table (and its page tables) read-only. */
+ if (!xen_feature(XENFEAT_writable_page_tables)
+ && xen_start_info->mfn_list < __START_KERNEL_map
+ && paddr >= (xen_start_info->first_p2m_pfn << PAGE_SHIFT)
+ && paddr < (xen_start_info->first_p2m_pfn
+ + xen_start_info->nr_p2m_frames) << PAGE_SHIFT)
+ readonly = 1;
/*
* No need for writable mapping of kernel image. This also ensures that
@@ -718,6 +742,12 @@ void __init xen_init_pt(void)
(PTRS_PER_PUD - pud_index(__START_KERNEL_map))
* sizeof(*level3_kernel_pgt));
+ /* Copy the initial P->M table mappings if necessary. */
+ addr = pgd_index(xen_start_info->mfn_list);
+ if (addr < pgd_index(__START_KERNEL_map))
+ init_level4_pgt[addr] =
+ ((pgd_t *)xen_start_info->pt_base)[addr];
+
/* Do an early initialization of the fixmap area. */
addr = __fix_to_virt(FIX_EARLYCON_MEM_BASE);
if (pud_present(level3_kernel_pgt[pud_index(addr)])) {
@@ -749,22 +779,27 @@ void __init xen_init_pt(void)
void __init xen_finish_init_mapping(void)
{
unsigned long start, end;
+ struct mmuext_op mmuext;
/* Re-vector virtual addresses pointing into the initial
mapping to the just-established permanent ones. */
xen_start_info = __va(__pa(xen_start_info));
xen_start_info->pt_base = (unsigned long)
__va(__pa(xen_start_info->pt_base));
- if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+ if (!xen_feature(XENFEAT_auto_translated_physmap)
+ && xen_start_info->mfn_list >= __START_KERNEL_map)
phys_to_machine_mapping =
__va(__pa(xen_start_info->mfn_list));
- xen_start_info->mfn_list = (unsigned long)
- phys_to_machine_mapping;
- }
if (xen_start_info->mod_start)
xen_start_info->mod_start = (unsigned long)
__va(__pa(xen_start_info->mod_start));
+ /* Unpin the no longer used Xen provided page tables. */
+ mmuext.cmd = MMUEXT_UNPIN_TABLE;
+ mmuext.arg1.mfn = virt_to_mfn(xen_start_info->pt_base);
+ if (HYPERVISOR_mmuext_op(&mmuext, 1, NULL, DOMID_SELF))
+ BUG();
+
/* Destroy the Xen-created mappings beyond the kernel image. */
start = PAGE_ALIGN(_brk_end);
end = __START_KERNEL_map + (e820_table_start << PAGE_SHIFT);
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pageattr-xen.c 2010-03-11 09:32:10.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pageattr-xen.c 2010-03-11 09:38:40.000000000 +0100
@@ -1438,7 +1438,7 @@ static void __make_page_writable(unsigne
pte = lookup_address(va, &level);
BUG_ON(!pte || level != PG_LEVEL_4K);
- if (HYPERVISOR_update_va_mapping(va, pte_mkwrite(*pte), 0))
+ if (HYPERVISOR_update_va_mapping(va, pte_mkwrite(*pte), UVMF_INVLPG))
BUG();
if (in_secondary_range(va)) {
unsigned long pfn = pte_pfn(*pte);
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pgtable-xen.c 2010-03-22 13:00:09.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pgtable-xen.c 2010-03-22 13:00:27.000000000 +0100
@@ -343,7 +343,7 @@ void __init xen_init_pgd_pin(void)
if (PTRS_PER_PUD > 1) /* not folded */
SetPagePinned(virt_to_page(pud));
for (u = 0; u < PTRS_PER_PUD; u++, pud++) {
- if (!pud_present(*pud))
+ if (!pud_present(*pud) || pud_large(*pud))
continue;
pmd = pmd_offset(pud, 0);
if (PTRS_PER_PMD > 1) /* not folded */
@@ -354,7 +354,7 @@ void __init xen_init_pgd_pin(void)
&& m >= pmd_index(HYPERVISOR_VIRT_START))
continue;
#endif
- if (!pmd_present(*pmd))
+ if (!pmd_present(*pmd) || pmd_large(*pmd))
continue;
SetPagePinned(pmd_page(*pmd));
}
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pgtable_32-xen.c 2009-11-06 10:52:02.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pgtable_32-xen.c 2009-12-04 12:12:10.000000000 +0100
@@ -175,6 +175,6 @@ void make_lowmem_page_writable(void *va,
pte = lookup_address((unsigned long)va, &level);
BUG_ON(!pte || level != PG_LEVEL_4K || !pte_present(*pte));
rc = HYPERVISOR_update_va_mapping(
- (unsigned long)va, pte_mkwrite(*pte), 0);
+ (unsigned long)va, pte_mkwrite(*pte), UVMF_INVLPG);
BUG_ON(rc);
}

337
xen-x86_64-pgd-alloc-order Normal file
View file

@ -0,0 +1,337 @@
From: jbeulich@novell.com
Subject: don't require order-1 allocations for pgd-s
Patch-mainline: obsolete
At the same time remove the useless user mode pair of init_level4_pgt.
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:55:40.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/hypervisor.h 2009-12-04 12:11:43.000000000 +0100
@@ -104,8 +104,8 @@ void do_hypervisor_callback(struct pt_re
* be MACHINE addresses.
*/
-void xen_pt_switch(unsigned long ptr);
-void xen_new_user_pt(unsigned long ptr); /* x86_64 only */
+void xen_pt_switch(pgd_t *);
+void xen_new_user_pt(pgd_t *); /* x86_64 only */
void xen_load_gs(unsigned int selector); /* x86_64 only */
void xen_tlb_flush(void);
void xen_invlpg(unsigned long ptr);
@@ -113,7 +113,7 @@ void xen_invlpg(unsigned long ptr);
void xen_l1_entry_update(pte_t *ptr, pte_t val);
void xen_l2_entry_update(pmd_t *ptr, pmd_t val);
void xen_l3_entry_update(pud_t *ptr, pud_t val); /* x86_64/PAE */
-void xen_l4_entry_update(pgd_t *ptr, int user, pgd_t val); /* x86_64 only */
+void xen_l4_entry_update(pgd_t *ptr, pgd_t val); /* x86_64 only */
void xen_pgd_pin(pgd_t *);
void xen_pgd_unpin(pgd_t *);
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/mmu_context.h 2009-11-06 10:52:22.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/mmu_context.h 2009-12-04 12:11:43.000000000 +0100
@@ -82,6 +82,9 @@ static inline void switch_mm(struct mm_s
{
unsigned cpu = smp_processor_id();
struct mmuext_op _op[2 + (sizeof(long) > 4)], *op = _op;
+#ifdef CONFIG_X86_64
+ pgd_t *upgd;
+#endif
if (likely(prev != next)) {
BUG_ON(!xen_feature(XENFEAT_writable_page_tables) &&
@@ -100,10 +103,11 @@ static inline void switch_mm(struct mm_s
op->arg1.mfn = virt_to_mfn(next->pgd);
op++;
- /* xen_new_user_pt(__pa(__user_pgd(next->pgd))) */
+ /* xen_new_user_pt(next->pgd) */
#ifdef CONFIG_X86_64
op->cmd = MMUEXT_NEW_USER_BASEPTR;
- op->arg1.mfn = virt_to_mfn(__user_pgd(next->pgd));
+ upgd = __user_pgd(next->pgd);
+ op->arg1.mfn = likely(upgd) ? virt_to_mfn(upgd) : 0;
op++;
#endif
@@ -131,7 +135,7 @@ static inline void switch_mm(struct mm_s
* to make sure to use no freed page tables.
*/
load_cr3(next->pgd);
- xen_new_user_pt(__pa(__user_pgd(next->pgd)));
+ xen_new_user_pt(next->pgd);
load_LDT_nolock(&next->context);
}
}
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/pgalloc.h 2010-03-22 12:59:30.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/pgalloc.h 2010-03-22 13:00:16.000000000 +0100
@@ -123,15 +123,13 @@ static inline void pud_populate(struct m
#endif /* CONFIG_X86_PAE */
#if PAGETABLE_LEVELS > 3
-#define __user_pgd(pgd) ((pgd) + PTRS_PER_PGD)
-
static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
{
pgd_t ent = __pgd(_PAGE_TABLE | __pa(pud));
paravirt_alloc_pud(mm, __pa(pud) >> PAGE_SHIFT);
if (unlikely(PagePinned(virt_to_page(pgd))))
- xen_l4_entry_update(pgd, 1, ent);
+ xen_l4_entry_update(pgd, ent);
else
*__user_pgd(pgd) = *pgd = ent;
}
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-11-06 11:12:01.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-12-04 12:11:43.000000000 +0100
@@ -100,18 +100,25 @@ static inline void xen_set_pud(pud_t *pu
: (void)(*__pudp = xen_make_pud(0)); \
})
-#define __user_pgd(pgd) ((pgd) + PTRS_PER_PGD)
+static inline pgd_t *__user_pgd(pgd_t *pgd)
+{
+ if (unlikely(((unsigned long)pgd & PAGE_MASK)
+ == (unsigned long)init_level4_pgt))
+ return NULL;
+ return (pgd_t *)(virt_to_page(pgd)->index
+ + ((unsigned long)pgd & ~PAGE_MASK));
+}
static inline void xen_set_pgd(pgd_t *pgdp, pgd_t pgd)
{
- xen_l4_entry_update(pgdp, 0, pgd);
+ xen_l4_entry_update(pgdp, pgd);
}
#define xen_pgd_clear(pgd) \
({ \
pgd_t *__pgdp = (pgd); \
PagePinned(virt_to_page(__pgdp)) \
- ? xen_l4_entry_update(__pgdp, 1, xen_make_pgd(0)) \
+ ? xen_l4_entry_update(__pgdp, xen_make_pgd(0)) \
: (void)(*__user_pgd(__pgdp) = *__pgdp = xen_make_pgd(0)); \
})
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/cpu/common-xen.c 2009-11-06 11:12:01.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/cpu/common-xen.c 2010-01-18 17:14:45.000000000 +0100
@@ -1026,8 +1026,7 @@ DEFINE_PER_CPU_FIRST(union irq_stack_uni
void xen_switch_pt(void)
{
#ifdef CONFIG_XEN
- xen_pt_switch(__pa_symbol(init_level4_pgt));
- xen_new_user_pt(__pa_symbol(__user_pgd(init_level4_pgt)));
+ xen_pt_switch(init_level4_pgt);
#endif
}
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/head_64-xen.S 2009-12-04 14:37:14.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/head_64-xen.S 2009-12-04 14:37:53.000000000 +0100
@@ -56,14 +56,6 @@ ENTRY(name)
__PAGE_ALIGNED_BSS
NEXT_PAGE(init_level4_pgt)
.fill 512,8,0
- /*
- * We update two pgd entries to make kernel and user pgd consistent
- * at pgd_populate(). It can be used for kernel modules. So we place
- * this page here for those cases to avoid memory corruption.
- * We also use this page to establish the initial mapping for the
- * vsyscall area.
- */
- .fill 512,8,0
NEXT_PAGE(level3_kernel_pgt)
.fill 512,8,0
--- sle11sp1-2010-03-22.orig/arch/x86/mm/hypervisor.c 2010-01-05 16:47:51.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/hypervisor.c 2010-01-05 16:47:55.000000000 +0100
@@ -525,7 +525,7 @@ void xen_l3_entry_update(pud_t *ptr, pud
#endif
#ifdef CONFIG_X86_64
-void xen_l4_entry_update(pgd_t *ptr, int user, pgd_t val)
+void xen_l4_entry_update(pgd_t *ptr, pgd_t val)
{
mmu_update_t u[2];
struct page *page = NULL;
@@ -538,8 +538,11 @@ void xen_l4_entry_update(pgd_t *ptr, int
}
u[0].ptr = virt_to_machine(ptr);
u[0].val = __pgd_val(val);
- if (user) {
- u[1].ptr = virt_to_machine(__user_pgd(ptr));
+ if (((unsigned long)ptr & ~PAGE_MASK)
+ <= pgd_index(TASK_SIZE_MAX) * sizeof(*ptr)) {
+ ptr = __user_pgd(ptr);
+ BUG_ON(!ptr);
+ u[1].ptr = virt_to_machine(ptr);
u[1].val = __pgd_val(val);
do_lN_entry_update(u, 2, page);
} else
@@ -547,21 +550,25 @@ void xen_l4_entry_update(pgd_t *ptr, int
}
#endif /* CONFIG_X86_64 */
-void xen_pt_switch(unsigned long ptr)
+#ifdef CONFIG_X86_64
+void xen_pt_switch(pgd_t *pgd)
{
struct mmuext_op op;
op.cmd = MMUEXT_NEW_BASEPTR;
- op.arg1.mfn = pfn_to_mfn(ptr >> PAGE_SHIFT);
+ op.arg1.mfn = virt_to_mfn(pgd);
BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
}
-void xen_new_user_pt(unsigned long ptr)
+void xen_new_user_pt(pgd_t *pgd)
{
struct mmuext_op op;
+
+ pgd = __user_pgd(pgd);
op.cmd = MMUEXT_NEW_USER_BASEPTR;
- op.arg1.mfn = pfn_to_mfn(ptr >> PAGE_SHIFT);
+ op.arg1.mfn = pgd ? virt_to_mfn(pgd) : 0;
BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
}
+#endif
void xen_tlb_flush(void)
{
@@ -638,7 +645,14 @@ void xen_pgd_pin(pgd_t *pgd)
op[0].arg1.mfn = virt_to_mfn(pgd);
#ifdef CONFIG_X86_64
op[1].cmd = op[0].cmd = MMUEXT_PIN_L4_TABLE;
- op[1].arg1.mfn = virt_to_mfn(__user_pgd(pgd));
+ pgd = __user_pgd(pgd);
+ if (pgd)
+ op[1].arg1.mfn = virt_to_mfn(pgd);
+ else {
+ op[1].cmd = MMUEXT_PIN_L3_TABLE;
+ op[1].arg1.mfn = pfn_to_mfn(__pa_symbol(level3_user_pgt)
+ >> PAGE_SHIFT);
+ }
#endif
if (HYPERVISOR_mmuext_op(op, NR_PGD_PIN_OPS, NULL, DOMID_SELF) < 0)
BUG();
@@ -651,8 +665,10 @@ void xen_pgd_unpin(pgd_t *pgd)
op[0].cmd = MMUEXT_UNPIN_TABLE;
op[0].arg1.mfn = virt_to_mfn(pgd);
#ifdef CONFIG_X86_64
+ pgd = __user_pgd(pgd);
+ BUG_ON(!pgd);
op[1].cmd = MMUEXT_UNPIN_TABLE;
- op[1].arg1.mfn = virt_to_mfn(__user_pgd(pgd));
+ op[1].arg1.mfn = virt_to_mfn(pgd);
#endif
if (HYPERVISOR_mmuext_op(op, NR_PGD_PIN_OPS, NULL, DOMID_SELF) < 0)
BUG();
--- sle11sp1-2010-03-22.orig/arch/x86/mm/init_64-xen.c 2009-10-13 17:25:37.000000000 +0200
+++ sle11sp1-2010-03-22/arch/x86/mm/init_64-xen.c 2009-12-04 12:11:43.000000000 +0100
@@ -718,9 +718,6 @@ void __init xen_init_pt(void)
(PTRS_PER_PUD - pud_index(__START_KERNEL_map))
* sizeof(*level3_kernel_pgt));
- __user_pgd(init_level4_pgt)[pgd_index(VSYSCALL_START)] =
- __pgd(__pa_symbol(level3_user_pgt) | _PAGE_TABLE);
-
/* Do an early initialization of the fixmap area. */
addr = __fix_to_virt(FIX_EARLYCON_MEM_BASE);
if (pud_present(level3_kernel_pgt[pud_index(addr)])) {
@@ -736,8 +733,6 @@ void __init xen_init_pt(void)
early_make_page_readonly(init_level4_pgt,
XENFEAT_writable_page_tables);
- early_make_page_readonly(__user_pgd(init_level4_pgt),
- XENFEAT_writable_page_tables);
early_make_page_readonly(level3_kernel_pgt,
XENFEAT_writable_page_tables);
early_make_page_readonly(level3_user_pgt,
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pgtable-xen.c 2010-03-22 13:00:04.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pgtable-xen.c 2010-03-22 13:00:09.000000000 +0100
@@ -290,9 +290,11 @@ static void pgd_walk(pgd_t *pgd_base, pg
BUG();
seq = 0;
}
+ pgd = __user_pgd(pgd_base);
+ BUG_ON(!pgd);
MULTI_update_va_mapping(mcl + seq,
- (unsigned long)__user_pgd(pgd_base),
- pfn_pte(virt_to_phys(__user_pgd(pgd_base))>>PAGE_SHIFT, flags),
+ (unsigned long)pgd,
+ pfn_pte(virt_to_phys(pgd)>>PAGE_SHIFT, flags),
0);
MULTI_update_va_mapping(mcl + seq + 1,
(unsigned long)pgd_base,
@@ -680,12 +682,29 @@ static void pgd_prepopulate_pmd(struct m
}
}
+static inline pgd_t *user_pgd_alloc(pgd_t *pgd)
+{
#ifdef CONFIG_X86_64
-/* We allocate two contiguous pages for kernel and user. */
-#define PGD_ORDER 1
-#else
-#define PGD_ORDER 0
+ if (pgd) {
+ pgd_t *upgd = (void *)__get_free_page(PGALLOC_GFP);
+
+ if (upgd)
+ virt_to_page(pgd)->index = (long)upgd;
+ else {
+ free_page((unsigned long)pgd);
+ pgd = NULL;
+ }
+ }
+#endif
+ return pgd;
+}
+
+static inline void user_pgd_free(pgd_t *pgd)
+{
+#ifdef CONFIG_X86_64
+ free_page(virt_to_page(pgd)->index);
#endif
+}
pgd_t *pgd_alloc(struct mm_struct *mm)
{
@@ -693,7 +712,7 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
pmd_t *pmds[PREALLOCATED_PMDS];
unsigned long flags;
- pgd = (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ORDER);
+ pgd = user_pgd_alloc((void *)__get_free_page(PGALLOC_GFP));
if (pgd == NULL)
goto out;
@@ -732,7 +751,8 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
out_free_pmds:
free_pmds(pmds, mm, !xen_feature(XENFEAT_pae_pgdir_above_4gb));
out_free_pgd:
- free_pages((unsigned long)pgd, PGD_ORDER);
+ user_pgd_free(pgd);
+ free_page((unsigned long)pgd);
out:
return NULL;
}
@@ -751,7 +771,8 @@ void pgd_free(struct mm_struct *mm, pgd_
pgd_mop_up_pmds(mm, pgd);
paravirt_pgd_free(mm, pgd);
- free_pages((unsigned long)pgd, PGD_ORDER);
+ user_pgd_free(pgd);
+ free_page((unsigned long)pgd);
}
/* blktap and gntdev need this, as otherwise they would implicitly (and
--- sle11sp1-2010-03-22.orig/drivers/xen/core/machine_reboot.c 2009-12-18 14:15:17.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/core/machine_reboot.c 2009-12-18 14:15:58.000000000 +0100
@@ -188,8 +188,7 @@ static int take_machine_down(void *_susp
* in fast-suspend mode as that implies a new enough Xen.
*/
if (!suspend->fast_suspend)
- xen_new_user_pt(__pa(__user_pgd(
- current->active_mm->pgd)));
+ xen_new_user_pt(current->active_mm->pgd);
#endif
}

111
xen-x86_64-pgd-pin Normal file
View file

@ -0,0 +1,111 @@
From: jbeulich@novell.com
Subject: make pinning of pgd pairs transparent to callers
Patch-mainline: obsolete
--- sle11sp1-2010-03-22.orig/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:53:45.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/include/mach-xen/asm/hypervisor.h 2009-11-23 10:55:40.000000000 +0100
@@ -114,8 +114,8 @@ void xen_l1_entry_update(pte_t *ptr, pte
void xen_l2_entry_update(pmd_t *ptr, pmd_t val);
void xen_l3_entry_update(pud_t *ptr, pud_t val); /* x86_64/PAE */
void xen_l4_entry_update(pgd_t *ptr, int user, pgd_t val); /* x86_64 only */
-void xen_pgd_pin(unsigned long ptr);
-void xen_pgd_unpin(unsigned long ptr);
+void xen_pgd_pin(pgd_t *);
+void xen_pgd_unpin(pgd_t *);
void xen_init_pgd_pin(void);
--- sle11sp1-2010-03-22.orig/arch/x86/mm/hypervisor.c 2010-01-05 16:47:18.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/hypervisor.c 2010-01-05 16:47:51.000000000 +0100
@@ -624,26 +624,38 @@ EXPORT_SYMBOL_GPL(xen_invlpg_mask);
#endif /* CONFIG_SMP */
-void xen_pgd_pin(unsigned long ptr)
-{
- struct mmuext_op op;
#ifdef CONFIG_X86_64
- op.cmd = MMUEXT_PIN_L4_TABLE;
-#elif defined(CONFIG_X86_PAE)
- op.cmd = MMUEXT_PIN_L3_TABLE;
+#define NR_PGD_PIN_OPS 2
#else
- op.cmd = MMUEXT_PIN_L2_TABLE;
+#define NR_PGD_PIN_OPS 1
#endif
- op.arg1.mfn = pfn_to_mfn(ptr >> PAGE_SHIFT);
- BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
+
+void xen_pgd_pin(pgd_t *pgd)
+{
+ struct mmuext_op op[NR_PGD_PIN_OPS];
+
+ op[0].cmd = MMUEXT_PIN_L3_TABLE;
+ op[0].arg1.mfn = virt_to_mfn(pgd);
+#ifdef CONFIG_X86_64
+ op[1].cmd = op[0].cmd = MMUEXT_PIN_L4_TABLE;
+ op[1].arg1.mfn = virt_to_mfn(__user_pgd(pgd));
+#endif
+ if (HYPERVISOR_mmuext_op(op, NR_PGD_PIN_OPS, NULL, DOMID_SELF) < 0)
+ BUG();
}
-void xen_pgd_unpin(unsigned long ptr)
+void xen_pgd_unpin(pgd_t *pgd)
{
- struct mmuext_op op;
- op.cmd = MMUEXT_UNPIN_TABLE;
- op.arg1.mfn = pfn_to_mfn(ptr >> PAGE_SHIFT);
- BUG_ON(HYPERVISOR_mmuext_op(&op, 1, NULL, DOMID_SELF) < 0);
+ struct mmuext_op op[NR_PGD_PIN_OPS];
+
+ op[0].cmd = MMUEXT_UNPIN_TABLE;
+ op[0].arg1.mfn = virt_to_mfn(pgd);
+#ifdef CONFIG_X86_64
+ op[1].cmd = MMUEXT_UNPIN_TABLE;
+ op[1].arg1.mfn = virt_to_mfn(__user_pgd(pgd));
+#endif
+ if (HYPERVISOR_mmuext_op(op, NR_PGD_PIN_OPS, NULL, DOMID_SELF) < 0)
+ BUG();
}
void xen_set_ldt(const void *ptr, unsigned int ents)
--- sle11sp1-2010-03-22.orig/arch/x86/mm/init_64-xen.c 2009-11-06 11:12:01.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/init_64-xen.c 2009-10-13 17:25:37.000000000 +0200
@@ -747,10 +747,8 @@ void __init xen_init_pt(void)
early_make_page_readonly(level1_fixmap_pgt,
XENFEAT_writable_page_tables);
- if (!xen_feature(XENFEAT_writable_page_tables)) {
- xen_pgd_pin(__pa_symbol(init_level4_pgt));
- xen_pgd_pin(__pa_symbol(__user_pgd(init_level4_pgt)));
- }
+ if (!xen_feature(XENFEAT_writable_page_tables))
+ xen_pgd_pin(init_level4_pgt);
}
void __init xen_finish_init_mapping(void)
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pgtable-xen.c 2010-03-22 12:59:47.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pgtable-xen.c 2010-03-22 13:00:04.000000000 +0100
@@ -367,19 +367,13 @@ static void __pgd_pin(pgd_t *pgd)
{
pgd_walk(pgd, PAGE_KERNEL_RO);
kmap_flush_unused();
- xen_pgd_pin(__pa(pgd)); /* kernel */
-#ifdef CONFIG_X86_64
- xen_pgd_pin(__pa(__user_pgd(pgd))); /* user */
-#endif
+ xen_pgd_pin(pgd);
SetPagePinned(virt_to_page(pgd));
}
static void __pgd_unpin(pgd_t *pgd)
{
- xen_pgd_unpin(__pa(pgd));
-#ifdef CONFIG_X86_64
- xen_pgd_unpin(__pa(__user_pgd(pgd)));
-#endif
+ xen_pgd_unpin(pgd);
pgd_walk(pgd, PAGE_KERNEL);
ClearPagePinned(virt_to_page(pgd));
}

193
xen3-auto-arch-i386.diff Normal file
View file

@ -0,0 +1,193 @@
Subject: xen3 arch-i386
From: http://xenbits.xensource.com/linux-2.6.18-xen.hg (tip 1011:11175e60d393)
Patch-mainline: obsolete
Acked-by: jbeulich@novell.com
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/asm-offsets_32.c 2010-03-01 14:09:07.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/asm-offsets_32.c 2009-12-04 10:44:46.000000000 +0100
@@ -93,9 +93,14 @@ void foo(void)
OFFSET(pbe_orig_address, pbe, orig_address);
OFFSET(pbe_next, pbe, next);
+#ifndef CONFIG_X86_NO_TSS
/* Offset from the sysenter stack to tss.sp0 */
- DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) -
+ DEFINE(SYSENTER_stack_sp0, offsetof(struct tss_struct, x86_tss.sp0) -
sizeof(struct tss_struct));
+#else
+ /* sysenter stack points directly to sp0 */
+ DEFINE(SYSENTER_stack_sp0, 0);
+#endif
DEFINE(PAGE_SIZE_asm, PAGE_SIZE);
DEFINE(PAGE_SHIFT_asm, PAGE_SHIFT);
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/entry_32.S 2010-03-01 14:09:07.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/entry_32.S 2009-12-04 10:44:46.000000000 +0100
@@ -393,7 +393,7 @@ ENTRY(ia32_sysenter_target)
CFI_SIGNAL_FRAME
CFI_DEF_CFA esp, 0
CFI_REGISTER esp, ebp
- movl TSS_sysenter_sp0(%esp),%esp
+ movl SYSENTER_stack_sp0(%esp),%esp
sysenter_past_esp:
/*
* Interrupts are disabled here, but we can't trace it until
@@ -1325,7 +1325,7 @@ END(page_fault)
* that sets up the real kernel stack. Check here, since we can't
* allow the wrong stack to be used.
*
- * "TSS_sysenter_sp0+12" is because the NMI/debug handler will have
+ * "SYSENTER_stack_sp0+12" is because the NMI/debug handler will have
* already pushed 3 words if it hits on the sysenter instruction:
* eflags, cs and eip.
*
@@ -1337,7 +1337,7 @@ END(page_fault)
cmpw $__KERNEL_CS, 4(%esp)
jne \ok
\label:
- movl TSS_sysenter_sp0 + \offset(%esp), %esp
+ movl SYSENTER_stack_sp0 + \offset(%esp), %esp
CFI_DEF_CFA esp, 0
CFI_UNDEFINED eip
pushfl
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/machine_kexec_32.c 2009-04-21 10:33:15.000000000 +0200
+++ sle11sp1-2010-03-01/arch/x86/kernel/machine_kexec_32.c 2009-12-04 10:44:46.000000000 +0100
@@ -26,6 +26,10 @@
#include <asm/system.h>
#include <asm/cacheflush.h>
+#ifdef CONFIG_XEN
+#include <xen/interface/kexec.h>
+#endif
+
static void machine_kexec_free_page_tables(struct kimage *image)
{
free_page((unsigned long)image->arch.pgd);
@@ -96,6 +100,55 @@ static void machine_kexec_prepare_page_t
__pa(control_page), __pa(control_page));
}
+#ifdef CONFIG_XEN
+
+#define __ma(x) (pfn_to_mfn(__pa((x)) >> PAGE_SHIFT) << PAGE_SHIFT)
+
+#if PAGES_NR > KEXEC_XEN_NO_PAGES
+#error PAGES_NR is greater than KEXEC_XEN_NO_PAGES - Xen support will break
+#endif
+
+#if PA_CONTROL_PAGE != 0
+#error PA_CONTROL_PAGE is non zero - Xen support will break
+#endif
+
+void machine_kexec_setup_load_arg(xen_kexec_image_t *xki, struct kimage *image)
+{
+ void *control_page;
+
+ memset(xki->page_list, 0, sizeof(xki->page_list));
+
+ control_page = page_address(image->control_code_page);
+ memcpy(control_page, relocate_kernel, PAGE_SIZE);
+
+ xki->page_list[PA_CONTROL_PAGE] = __ma(control_page);
+ xki->page_list[PA_PGD] = __ma(kexec_pgd);
+#ifdef CONFIG_X86_PAE
+ xki->page_list[PA_PMD_0] = __ma(kexec_pmd0);
+ xki->page_list[PA_PMD_1] = __ma(kexec_pmd1);
+#endif
+ xki->page_list[PA_PTE_0] = __ma(kexec_pte0);
+ xki->page_list[PA_PTE_1] = __ma(kexec_pte1);
+
+}
+
+int __init machine_kexec_setup_resources(struct resource *hypervisor,
+ struct resource *phys_cpus,
+ int nr_phys_cpus)
+{
+ int k;
+
+ /* The per-cpu crash note resources belong to the hypervisor resource */
+ for (k = 0; k < nr_phys_cpus; k++)
+ request_resource(hypervisor, phys_cpus + k);
+
+ return 0;
+}
+
+void machine_kexec_register_resources(struct resource *res) { ; }
+
+#endif /* CONFIG_XEN */
+
/*
* A architecture hook called to validate the
* proposed image and prepare the control pages
@@ -135,6 +188,7 @@ void machine_kexec_cleanup(struct kimage
machine_kexec_free_page_tables(image);
}
+#ifndef CONFIG_XEN
/*
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
@@ -199,6 +253,7 @@ void machine_kexec(struct kimage *image)
__ftrace_enabled_restore(save_ftrace_enabled);
}
+#endif
void arch_crash_save_vmcoreinfo(void)
{
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/vm86_32.c 2010-03-01 14:09:07.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/vm86_32.c 2009-12-04 10:44:46.000000000 +0100
@@ -125,7 +125,9 @@ static int copy_vm86_regs_from_user(stru
struct pt_regs *save_v86_state(struct kernel_vm86_regs *regs)
{
+#ifndef CONFIG_X86_NO_TSS
struct tss_struct *tss;
+#endif
struct pt_regs *ret;
unsigned long tmp;
@@ -148,12 +150,16 @@ struct pt_regs *save_v86_state(struct ke
do_exit(SIGSEGV);
}
+#ifndef CONFIG_X86_NO_TSS
tss = &per_cpu(init_tss, get_cpu());
+#endif
current->thread.sp0 = current->thread.saved_sp0;
current->thread.sysenter_cs = __KERNEL_CS;
load_sp0(tss, &current->thread);
current->thread.saved_sp0 = 0;
+#ifndef CONFIG_X86_NO_TSS
put_cpu();
+#endif
ret = KVM86->regs32;
@@ -280,7 +286,9 @@ out:
static void do_sys_vm86(struct kernel_vm86_struct *info, struct task_struct *tsk)
{
+#ifndef CONFIG_X86_NO_TSS
struct tss_struct *tss;
+#endif
/*
* make sure the vm86() system call doesn't try to do anything silly
*/
@@ -324,12 +332,16 @@ static void do_sys_vm86(struct kernel_vm
tsk->thread.saved_fs = info->regs32->fs;
tsk->thread.saved_gs = get_user_gs(info->regs32);
+#ifndef CONFIG_X86_NO_TSS
tss = &per_cpu(init_tss, get_cpu());
+#endif
tsk->thread.sp0 = (unsigned long) &info->VM86_TSS_ESP0;
if (cpu_has_sep)
tsk->thread.sysenter_cs = 0;
load_sp0(tss, &tsk->thread);
+#ifndef CONFIG_X86_NO_TSS
put_cpu();
+#endif
tsk->thread.screen_bitmap = info->screen_bitmap;
if (info->flags & VM86_SCREEN_BITMAP)

464
xen3-auto-arch-x86.diff Normal file
View file

@ -0,0 +1,464 @@
Subject: xen3 arch-x86
From: http://xenbits.xensource.com/linux-2.6.18-xen.hg (tip 1011:11175e60d393)
Patch-mainline: obsolete
Acked-by: jbeulich@novell.com
List of files that don't require modification anymore (and hence
removed from this patch), for reference and in case upstream wants to
take the forward porting patches:
2.6.26/arch/x86/kernel/crash.c
2.6.30/arch/x86/kernel/acpi/boot.c
--- sle11sp1-2010-03-29.orig/arch/x86/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -111,6 +111,10 @@ endif
# prevent gcc from generating any FP code by mistake
KBUILD_CFLAGS += $(call cc-option,-mno-sse -mno-mmx -mno-sse2 -mno-3dnow,)
+# Xen subarch support
+mflags-$(CONFIG_X86_XEN) := -Iinclude/asm-x86/mach-xen
+mcore-$(CONFIG_X86_XEN) := arch/x86/mach-xen/
+
KBUILD_CFLAGS += $(mflags-y)
KBUILD_AFLAGS += $(mflags-y)
@@ -151,9 +155,26 @@ boot := arch/x86/boot
BOOT_TARGETS = bzlilo bzdisk fdimage fdimage144 fdimage288 isoimage
-PHONY += bzImage $(BOOT_TARGETS)
+PHONY += bzImage vmlinuz $(BOOT_TARGETS)
+
+ifdef CONFIG_XEN
+CPPFLAGS := -D__XEN_INTERFACE_VERSION__=$(CONFIG_XEN_INTERFACE_VERSION) \
+ -Iinclude$(if $(KBUILD_SRC),2)/asm/mach-xen $(CPPFLAGS)
+
+ifdef CONFIG_X86_64
+LDFLAGS_vmlinux := -e startup_64
+endif
# Default kernel to build
+all: vmlinuz
+
+# KBUILD_IMAGE specifies the target image being built
+KBUILD_IMAGE := $(boot)/vmlinuz
+
+vmlinuz: vmlinux
+ $(Q)$(MAKE) $(build)=$(boot) $(KBUILD_IMAGE)
+else
+# Default kernel to build
all: bzImage
# KBUILD_IMAGE specify target image being built
@@ -166,6 +187,7 @@ bzImage: vmlinux
$(BOOT_TARGETS): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $@
+endif
PHONY += install
install:
--- sle11sp1-2010-03-29.orig/arch/x86/boot/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/boot/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -23,6 +23,7 @@ ROOT_DEV := CURRENT
SVGA_MODE := -DSVGA_MODE=NORMAL_VGA
targets := vmlinux.bin setup.bin setup.elf bzImage
+targets += vmlinuz vmlinux-stripped
targets += fdimage fdimage144 fdimage288 image.iso mtools.conf
subdir- := compressed
@@ -195,6 +196,14 @@ bzlilo: $(obj)/bzImage
cp System.map $(INSTALL_PATH)/
if [ -x /sbin/lilo ]; then /sbin/lilo; else /etc/lilo/install; fi
+$(obj)/vmlinuz: $(obj)/vmlinux-stripped FORCE
+ $(call if_changed,gzip)
+ @echo 'Kernel: $@ is ready' ' (#'`cat .version`')'
+
+$(obj)/vmlinux-stripped: OBJCOPYFLAGS := -g --strip-unneeded
+$(obj)/vmlinux-stripped: vmlinux FORCE
+ $(call if_changed,objcopy)
+
install:
sh $(srctree)/$(src)/install.sh $(KERNELRELEASE) $(obj)/bzImage \
System.map "$(INSTALL_PATH)"
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -117,9 +117,12 @@ obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION)
obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o
+obj-$(CONFIG_X86_XEN) += fixup.o
+
###
# 64 bit specific files
ifeq ($(CONFIG_X86_64),y)
+ obj-$(CONFIG_X86_XEN_GENAPIC) += genapic_xen_64.o
obj-$(CONFIG_X86_UV) += tlb_uv.o bios_uv.o uv_irq.o uv_sysfs.o uv_time.o
obj-$(CONFIG_X86_PM_TIMER) += pmtimer_64.o
obj-$(CONFIG_AUDIT) += audit_64.o
@@ -130,4 +133,10 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o
obj-y += vsmp_64.o
+
+ time_64-$(CONFIG_XEN) += time_32.o
+ pci-dma_64-$(CONFIG_XEN) += pci-dma_32.o
endif
+
+disabled-obj-$(CONFIG_XEN) := i8259_$(BITS).o reboot.o smpboot_$(BITS).o
+%/head_$(BITS).o %/head_$(BITS).s: $(if $(CONFIG_XEN),EXTRA_AFLAGS,dummy) :=
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/acpi/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/acpi/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -5,6 +5,9 @@ obj-$(CONFIG_ACPI_SLEEP) += sleep.o wake
ifneq ($(CONFIG_ACPI_PROCESSOR),)
obj-y += cstate.o processor.o
+ifneq ($(CONFIG_PROCESSOR_EXTERNAL_CONTROL),)
+obj-$(CONFIG_XEN) += processor_extcntl_xen.o
+endif
endif
$(obj)/wakeup_rm.o: $(obj)/realmode/wakeup.bin
@@ -12,3 +15,4 @@ $(obj)/wakeup_rm.o: $(obj)/realmode/w
$(obj)/realmode/wakeup.bin: FORCE
$(Q)$(MAKE) $(build)=$(obj)/realmode
+disabled-obj-$(CONFIG_XEN) := cstate.o wakeup_$(BITS).o
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/acpi/processor.c 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/acpi/processor.c 2009-12-04 10:44:45.000000000 +0100
@@ -76,7 +76,18 @@ static void init_intel_pdc(struct acpi_p
/* Initialize _PDC data based on the CPU vendor */
void arch_acpi_processor_init_pdc(struct acpi_processor *pr)
{
+#ifdef CONFIG_XEN
+ /*
+ * As a work-around, just use cpu0's cpuinfo for all processors.
+ * Further work is required to expose xen hypervisor interface of
+ * getting physical cpuinfo to dom0 kernel and then
+ * arch_acpi_processor_init_pdc can set _PDC parameters according
+ * to Xen's phys information.
+ */
+ struct cpuinfo_x86 *c = &boot_cpu_data;
+#else
struct cpuinfo_x86 *c = &cpu_data(pr->id);
+#endif
pr->pdc = NULL;
if (c->x86_vendor == X86_VENDOR_INTEL ||
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/cpu/mcheck/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/cpu/mcheck/Makefile 2010-01-27 14:28:25.000000000 +0100
@@ -4,6 +4,7 @@ obj-$(CONFIG_X86_ANCIENT_MCE) += winchip
obj-$(CONFIG_X86_MCE_INTEL) += mce_intel.o
obj-$(CONFIG_X86_MCE_XEON75XX) += mce-xeon75xx.o
obj-$(CONFIG_X86_MCE_AMD) += mce_amd.o
+obj-$(CONFIG_X86_XEN_MCE) += mce_dom0.o
obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
obj-$(CONFIG_X86_MCE_INJECT) += mce-inject.o
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/cpu/mcheck/mce.c 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/cpu/mcheck/mce.c 2010-01-27 14:28:39.000000000 +0100
@@ -1127,8 +1127,15 @@ void mce_log_therm_throt_event(__u64 sta
* Periodic polling timer for "silent" machine check errors. If the
* poller finds an MCE, poll 2x faster. When the poller finds no more
* errors, poll 2x slower (up to check_interval seconds).
+ *
+ * We will disable polling in DOM0 since all CMCI/Polling
+ * mechanism will be done in XEN for Intel CPUs
*/
+#if defined (CONFIG_X86_XEN_MCE)
+static int check_interval = 0; /* disable polling */
+#else
static int check_interval = 5 * 60; /* 5 minutes */
+#endif
static DEFINE_PER_CPU(int, mce_next_interval); /* in jiffies */
static DEFINE_PER_CPU(struct timer_list, mce_timer);
@@ -1293,6 +1300,7 @@ static int __cpuinit mce_cpu_quirks(stru
/* This should be disabled by the BIOS, but isn't always */
if (c->x86_vendor == X86_VENDOR_AMD) {
+#ifndef CONFIG_XEN
if (c->x86 == 15 && banks > 4) {
/*
* disable GART TBL walk error reporting, which
@@ -1301,6 +1309,7 @@ static int __cpuinit mce_cpu_quirks(stru
*/
clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
}
+#endif
if (c->x86 <= 17 && mce_bootlog < 0) {
/*
* Lots of broken BIOS around that don't clear them
@@ -1368,6 +1377,7 @@ static void __cpuinit mce_ancient_init(s
static void mce_cpu_features(struct cpuinfo_x86 *c)
{
+#ifndef CONFIG_X86_64_XEN
switch (c->x86_vendor) {
case X86_VENDOR_INTEL:
mce_intel_feature_init(c);
@@ -1378,6 +1388,7 @@ static void mce_cpu_features(struct cpui
default:
break;
}
+#endif
}
static void mce_init_timer(void)
@@ -2064,6 +2075,16 @@ static __init int mce_init_device(void)
register_hotcpu_notifier(&mce_cpu_notifier);
misc_register(&mce_log_device);
+#ifdef CONFIG_X86_XEN_MCE
+ if (is_initial_xendomain()) {
+ /* Register vIRQ handler for MCE LOG processing */
+ extern void bind_virq_for_mce(void);
+
+ printk(KERN_DEBUG "MCE: bind virq for DOM0 logging\n");
+ bind_virq_for_mce();
+ }
+#endif
+
return err;
}
--- sle11sp1-2010-03-29.orig/arch/x86/kernel/cpu/mtrr/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/kernel/cpu/mtrr/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -1,3 +1,4 @@
obj-y := main.o if.o generic.o state.o cleanup.o
obj-$(CONFIG_X86_32) += amd.o cyrix.o centaur.o
+obj-$(CONFIG_XEN) := main.o if.o
--- sle11sp1-2010-03-29.orig/arch/x86/lib/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/lib/Makefile 2010-03-29 09:06:18.000000000 +0200
@@ -28,3 +28,5 @@ else
lib-y += copy_user_64.o rwlock_64.o copy_user_nocache_64.o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem_64.o
endif
+
+lib-$(CONFIG_XEN_SCRUB_PAGES) += scrub.o
--- sle11sp1-2010-03-29.orig/arch/x86/mm/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/mm/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -25,4 +25,6 @@ obj-$(CONFIG_NUMA) += numa.o numa_$(BIT
obj-$(CONFIG_K8_NUMA) += k8topology_64.o
obj-$(CONFIG_ACPI_NUMA) += srat_$(BITS).o
+obj-$(CONFIG_XEN) += hypervisor.o
+
obj-$(CONFIG_MEMTEST) += memtest.o
--- sle11sp1-2010-03-29.orig/arch/x86/oprofile/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/oprofile/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -6,7 +6,14 @@ DRIVER_OBJS = $(addprefix ../../../drive
oprofilefs.o oprofile_stats.o \
timer_int.o )
+ifdef CONFIG_XEN
+XENOPROF_COMMON_OBJS = $(addprefix ../../../drivers/xen/xenoprof/, \
+ xenoprofile.o)
+oprofile-y := $(DRIVER_OBJS) \
+ $(XENOPROF_COMMON_OBJS) xenoprof.o
+else
oprofile-y := $(DRIVER_OBJS) init.o backtrace.o
oprofile-$(CONFIG_X86_LOCAL_APIC) += nmi_int.o op_model_amd.o \
op_model_ppro.o op_model_p4.o
oprofile-$(CONFIG_X86_IO_APIC) += nmi_timer_int.o
+endif
--- sle11sp1-2010-03-29.orig/arch/x86/pci/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/pci/Makefile 2009-12-04 10:44:45.000000000 +0100
@@ -4,6 +4,9 @@ obj-$(CONFIG_PCI_BIOS) += pcbios.o
obj-$(CONFIG_PCI_MMCONFIG) += mmconfig_$(BITS).o direct.o mmconfig-shared.o
obj-$(CONFIG_PCI_DIRECT) += direct.o
obj-$(CONFIG_PCI_OLPC) += olpc.o
+# pcifront should be after mmconfig.o and direct.o as it should only
+# take over if direct access to the PCI bus is unavailable
+obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += pcifront.o
obj-y += fixup.o
obj-$(CONFIG_ACPI) += acpi.o
--- sle11sp1-2010-03-29.orig/arch/x86/power/cpu.c 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/power/cpu.c 2009-12-04 10:44:45.000000000 +0100
@@ -125,6 +125,7 @@ static void do_fpu_end(void)
static void fix_processor_context(void)
{
+#ifndef CONFIG_X86_NO_TSS
int cpu = smp_processor_id();
struct tss_struct *t = &per_cpu(init_tss, cpu);
@@ -137,7 +138,10 @@ static void fix_processor_context(void)
#ifdef CONFIG_X86_64
get_cpu_gdt_table(cpu)[GDT_ENTRY_TSS].type = 9;
+#endif
+#endif
+#ifdef CONFIG_X86_64
syscall_init(); /* This sets MSR_*STAR and related */
#endif
load_TR_desc(); /* This does ltr */
--- sle11sp1-2010-03-29.orig/arch/x86/include/asm/acpi.h 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/include/asm/acpi.h 2009-12-04 10:44:45.000000000 +0100
@@ -30,6 +30,10 @@
#include <asm/mmu.h>
#include <asm/mpspec.h>
+#ifdef CONFIG_XEN
+#include <xen/interface/platform.h>
+#endif
+
#define COMPILER_DEPENDENT_INT64 long long
#define COMPILER_DEPENDENT_UINT64 unsigned long long
@@ -120,6 +124,27 @@ extern unsigned long acpi_wakeup_address
/* early initialization routine */
extern void acpi_reserve_bootmem(void);
+#ifdef CONFIG_XEN
+static inline int acpi_notify_hypervisor_state(u8 sleep_state,
+ u32 pm1a_cnt_val,
+ u32 pm1b_cnt_val)
+{
+ struct xen_platform_op op = {
+ .cmd = XENPF_enter_acpi_sleep,
+ .interface_version = XENPF_INTERFACE_VERSION,
+ .u = {
+ .enter_acpi_sleep = {
+ .pm1a_cnt_val = pm1a_cnt_val,
+ .pm1b_cnt_val = pm1b_cnt_val,
+ .sleep_state = sleep_state,
+ },
+ },
+ };
+
+ return HYPERVISOR_platform_op(&op);
+}
+#endif /* CONFIG_XEN */
+
/*
* Check if the CPU can handle C2 and deeper
*/
@@ -152,7 +177,9 @@ static inline void disable_acpi(void) {
#endif /* !CONFIG_ACPI */
+#ifndef CONFIG_XEN
#define ARCH_HAS_POWER_INIT 1
+#endif
struct bootnode;
--- sle11sp1-2010-03-29.orig/arch/x86/include/asm/apic.h 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/include/asm/apic.h 2009-12-04 10:44:45.000000000 +0100
@@ -15,7 +15,9 @@
#include <asm/system.h>
#include <asm/msr.h>
+#ifndef CONFIG_XEN
#define ARCH_APICTIMER_STOPS_ON_C3 1
+#endif
/*
* Debugging macros
--- sle11sp1-2010-03-29.orig/arch/x86/include/asm/kexec.h 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/include/asm/kexec.h 2009-12-04 10:44:45.000000000 +0100
@@ -163,6 +163,19 @@ struct kimage_arch {
};
#endif
+/* Under Xen we need to work with machine addresses. These macros give the
+ * machine address of a certain page to the generic kexec code instead of
+ * the pseudo physical address which would be given by the default macros.
+ */
+
+#ifdef CONFIG_XEN
+#define KEXEC_ARCH_HAS_PAGE_MACROS
+#define kexec_page_to_pfn(page) pfn_to_mfn(page_to_pfn(page))
+#define kexec_pfn_to_page(pfn) pfn_to_page(mfn_to_pfn(pfn))
+#define kexec_virt_to_phys(addr) virt_to_machine(addr)
+#define kexec_phys_to_virt(addr) phys_to_virt(machine_to_phys(addr))
+#endif
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_KEXEC_H */
--- sle11sp1-2010-03-29.orig/arch/x86/include/asm/types.h 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/include/asm/types.h 2010-02-09 16:45:16.000000000 +0100
@@ -9,7 +9,7 @@
#ifndef __ASSEMBLY__
typedef u64 dma64_addr_t;
-#if defined(CONFIG_X86_64) || defined(CONFIG_HIGHMEM64G)
+#if defined(CONFIG_X86_64) || defined(CONFIG_XEN) || defined(CONFIG_HIGHMEM64G)
/* DMA addresses come in 32-bit and 64-bit flavours. */
typedef u64 dma_addr_t;
#else
--- sle11sp1-2010-03-29.orig/arch/x86/vdso/Makefile 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/vdso/Makefile 2009-12-04 10:44:49.000000000 +0100
@@ -65,6 +65,8 @@ obj-$(VDSO32-y) += vdso32-syms.lds
vdso32.so-$(VDSO32-y) += int80
vdso32.so-$(CONFIG_COMPAT) += syscall
vdso32.so-$(VDSO32-y) += sysenter
+xen-vdso32-$(subst 1,$(CONFIG_COMPAT),$(shell expr $(CONFIG_XEN_COMPAT)0 '<' 0x0302000)) += int80
+vdso32.so-$(CONFIG_XEN) += $(xen-vdso32-y)
vdso32-images = $(vdso32.so-y:%=vdso32-%.so)
--- sle11sp1-2010-03-29.orig/arch/x86/vdso/vdso32-setup.c 2010-03-29 09:00:35.000000000 +0200
+++ sle11sp1-2010-03-29/arch/x86/vdso/vdso32-setup.c 2009-12-04 10:44:46.000000000 +0100
@@ -26,6 +26,10 @@
#include <asm/vdso.h>
#include <asm/proto.h>
+#ifdef CONFIG_XEN
+#include <xen/interface/callback.h>
+#endif
+
enum {
VDSO_DISABLED = 0,
VDSO_ENABLED = 1,
@@ -225,6 +229,7 @@ static inline void map_compat_vdso(int m
void enable_sep_cpu(void)
{
+#ifndef CONFIG_XEN
int cpu = get_cpu();
struct tss_struct *tss = &per_cpu(init_tss, cpu);
@@ -239,6 +244,35 @@ void enable_sep_cpu(void)
wrmsr(MSR_IA32_SYSENTER_ESP, tss->x86_tss.sp1, 0);
wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long) ia32_sysenter_target, 0);
put_cpu();
+#else
+ extern asmlinkage void ia32pv_sysenter_target(void);
+ static struct callback_register sysenter = {
+ .type = CALLBACKTYPE_sysenter,
+ .address = { __KERNEL_CS, (unsigned long)ia32pv_sysenter_target },
+ };
+
+ if (!boot_cpu_has(X86_FEATURE_SEP))
+ return;
+
+ get_cpu();
+
+ if (xen_feature(XENFEAT_supervisor_mode_kernel))
+ sysenter.address.eip = (unsigned long)ia32_sysenter_target;
+
+ switch (HYPERVISOR_callback_op(CALLBACKOP_register, &sysenter)) {
+ case 0:
+ break;
+#if CONFIG_XEN_COMPAT < 0x030200
+ case -ENOSYS:
+ sysenter.type = CALLBACKTYPE_sysenter_deprecated;
+ if (HYPERVISOR_callback_op(CALLBACKOP_register, &sysenter) == 0)
+ break;
+#endif
+ default:
+ clear_bit(X86_FEATURE_SEP, boot_cpu_data.x86_capability);
+ break;
+ }
+#endif
}
static struct vm_area_struct gate_vma;

211
xen3-auto-arch-x86_64.diff Normal file
View file

@ -0,0 +1,211 @@
Subject: xen3 arch-x86_64
From: http://xenbits.xensource.com/linux-2.6.18-xen.hg (tip 1011:11175e60d393)
Patch-mainline: obsolete
Acked-by: jbeulich@novell.com
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/asm-offsets_64.c 2010-03-01 14:09:07.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/asm-offsets_64.c 2009-12-04 10:44:49.000000000 +0100
@@ -115,8 +115,10 @@ int main(void)
ENTRY(cr8);
BLANK();
#undef ENTRY
+#ifndef CONFIG_X86_NO_TSS
DEFINE(TSS_ist, offsetof(struct tss_struct, x86_tss.ist));
BLANK();
+#endif
DEFINE(crypto_tfm_ctx_offset, offsetof(struct crypto_tfm, __crt_ctx));
BLANK();
DEFINE(__NR_syscall_max, sizeof(syscalls) - 1);
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/machine_kexec_64.c 2009-04-21 10:35:13.000000000 +0200
+++ sle11sp1-2010-03-01/arch/x86/kernel/machine_kexec_64.c 2009-12-04 10:44:49.000000000 +0100
@@ -19,6 +19,119 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
+#ifdef CONFIG_XEN
+
+/* In the case of Xen, override hypervisor functions to be able to create
+ * a regular identity mapping page table...
+ */
+
+#include <xen/interface/kexec.h>
+#include <xen/interface/memory.h>
+
+#define x__pmd(x) ((pmd_t) { (x) } )
+#define x__pud(x) ((pud_t) { (x) } )
+#define x__pgd(x) ((pgd_t) { (x) } )
+
+#define x_pmd_val(x) ((x).pmd)
+#define x_pud_val(x) ((x).pud)
+#define x_pgd_val(x) ((x).pgd)
+
+static inline void x_set_pmd(pmd_t *dst, pmd_t val)
+{
+ x_pmd_val(*dst) = x_pmd_val(val);
+}
+
+static inline void x_set_pud(pud_t *dst, pud_t val)
+{
+ x_pud_val(*dst) = phys_to_machine(x_pud_val(val));
+}
+
+static inline void x_pud_clear (pud_t *pud)
+{
+ x_pud_val(*pud) = 0;
+}
+
+static inline void x_set_pgd(pgd_t *dst, pgd_t val)
+{
+ x_pgd_val(*dst) = phys_to_machine(x_pgd_val(val));
+}
+
+static inline void x_pgd_clear (pgd_t * pgd)
+{
+ x_pgd_val(*pgd) = 0;
+}
+
+#define X__PAGE_KERNEL_LARGE_EXEC \
+ _PAGE_PRESENT | _PAGE_RW | _PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_PSE
+#define X_KERNPG_TABLE _PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY
+
+#define __ma(x) (pfn_to_mfn(__pa((x)) >> PAGE_SHIFT) << PAGE_SHIFT)
+
+#if PAGES_NR > KEXEC_XEN_NO_PAGES
+#error PAGES_NR is greater than KEXEC_XEN_NO_PAGES - Xen support will break
+#endif
+
+#if PA_CONTROL_PAGE != 0
+#error PA_CONTROL_PAGE is non zero - Xen support will break
+#endif
+
+void machine_kexec_setup_load_arg(xen_kexec_image_t *xki, struct kimage *image)
+{
+ void *control_page;
+ void *table_page;
+
+ memset(xki->page_list, 0, sizeof(xki->page_list));
+
+ control_page = page_address(image->control_code_page) + PAGE_SIZE;
+ memcpy(control_page, relocate_kernel, PAGE_SIZE);
+
+ table_page = page_address(image->control_code_page);
+
+ xki->page_list[PA_CONTROL_PAGE] = __ma(control_page);
+ xki->page_list[PA_TABLE_PAGE] = __ma(table_page);
+
+ xki->page_list[PA_PGD] = __ma(kexec_pgd);
+ xki->page_list[PA_PUD_0] = __ma(kexec_pud0);
+ xki->page_list[PA_PUD_1] = __ma(kexec_pud1);
+ xki->page_list[PA_PMD_0] = __ma(kexec_pmd0);
+ xki->page_list[PA_PMD_1] = __ma(kexec_pmd1);
+ xki->page_list[PA_PTE_0] = __ma(kexec_pte0);
+ xki->page_list[PA_PTE_1] = __ma(kexec_pte1);
+}
+
+int __init machine_kexec_setup_resources(struct resource *hypervisor,
+ struct resource *phys_cpus,
+ int nr_phys_cpus)
+{
+ int k;
+
+ /* The per-cpu crash note resources belong to the hypervisor resource */
+ for (k = 0; k < nr_phys_cpus; k++)
+ request_resource(hypervisor, phys_cpus + k);
+
+ return 0;
+}
+
+void machine_kexec_register_resources(struct resource *res) { ; }
+
+#else /* CONFIG_XEN */
+
+#define x__pmd(x) __pmd(x)
+#define x__pud(x) __pud(x)
+#define x__pgd(x) __pgd(x)
+
+#define x_set_pmd(x, y) set_pmd(x, y)
+#define x_set_pud(x, y) set_pud(x, y)
+#define x_set_pgd(x, y) set_pgd(x, y)
+
+#define x_pud_clear(x) pud_clear(x)
+#define x_pgd_clear(x) pgd_clear(x)
+
+#define X__PAGE_KERNEL_LARGE_EXEC __PAGE_KERNEL_LARGE_EXEC
+#define X_KERNPG_TABLE _KERNPG_TABLE
+
+#endif /* CONFIG_XEN */
+
static int init_one_level2_page(struct kimage *image, pgd_t *pgd,
unsigned long addr)
{
@@ -61,7 +174,7 @@ static void init_level2_page(pmd_t *leve
addr &= PAGE_MASK;
end_addr = addr + PUD_SIZE;
while (addr < end_addr) {
- set_pmd(level2p++, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
+ x_set_pmd(level2p++, x__pmd(addr | X__PAGE_KERNEL_LARGE_EXEC));
addr += PMD_SIZE;
}
}
@@ -86,12 +199,12 @@ static int init_level3_page(struct kimag
}
level2p = (pmd_t *)page_address(page);
init_level2_page(level2p, addr);
- set_pud(level3p++, __pud(__pa(level2p) | _KERNPG_TABLE));
+ x_set_pud(level3p++, x__pud(__pa(level2p) | X_KERNPG_TABLE));
addr += PUD_SIZE;
}
/* clear the unused entries */
while (addr < end_addr) {
- pud_clear(level3p++);
+ x_pud_clear(level3p++);
addr += PUD_SIZE;
}
out:
@@ -121,12 +234,12 @@ static int init_level4_page(struct kimag
result = init_level3_page(image, level3p, addr, last_addr);
if (result)
goto out;
- set_pgd(level4p++, __pgd(__pa(level3p) | _KERNPG_TABLE));
+ x_set_pgd(level4p++, x__pgd(__pa(level3p) | X_KERNPG_TABLE));
addr += PGDIR_SIZE;
}
/* clear the unused entries */
while (addr < end_addr) {
- pgd_clear(level4p++);
+ x_pgd_clear(level4p++);
addr += PGDIR_SIZE;
}
out:
@@ -187,8 +300,14 @@ static int init_pgtable(struct kimage *i
{
pgd_t *level4p;
int result;
+ unsigned long x_max_pfn = max_pfn;
+
+#ifdef CONFIG_XEN
+ x_max_pfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL);
+#endif
+
level4p = (pgd_t *)__va(start_pgtable);
- result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+ result = init_level4_page(image, level4p, 0, x_max_pfn << PAGE_SHIFT);
if (result)
return result;
/*
@@ -222,6 +341,7 @@ void machine_kexec_cleanup(struct kimage
free_transition_pgtable(image);
}
+#ifndef CONFIG_XEN
/*
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
@@ -280,6 +400,7 @@ void machine_kexec(struct kimage *image)
__ftrace_enabled_restore(save_ftrace_enabled);
}
+#endif
void arch_crash_save_vmcoreinfo(void)
{

4158
xen3-auto-common.diff Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

45322
xen3-auto-xen-arch.diff Normal file

File diff suppressed because it is too large Load diff

60133
xen3-auto-xen-drivers.diff Normal file

File diff suppressed because it is too large Load diff

854
xen3-auto-xen-kconfig.diff Normal file
View file

@ -0,0 +1,854 @@
Subject: xen3 xen-kconfig
From: http://xenbits.xensource.com/linux-2.6.18-xen.hg (tip 1011:11175e60d393)
Patch-mainline: obsolete
Acked-by: jbeulich@novell.com
--- sle11sp1-2010-03-22.orig/arch/x86/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/Kconfig 2010-02-09 16:43:56.000000000 +0100
@@ -63,6 +63,7 @@ config ARCH_DEFCONFIG
config GENERIC_TIME
def_bool y
+ depends on !X86_XEN
config GENERIC_CMOS_UPDATE
def_bool y
@@ -213,12 +214,23 @@ config X86_64_SMP
config X86_HT
bool
- depends on SMP
+ depends on SMP && !XEN
default y
config X86_TRAMPOLINE
bool
depends on SMP || (64BIT && ACPI_SLEEP)
+ depends on !XEN
+ default y
+
+config X86_NO_TSS
+ bool
+ depends on X86_XEN || X86_64_XEN
+ default y
+
+config X86_NO_IDT
+ bool
+ depends on X86_XEN || X86_64_XEN
default y
config X86_32_LAZY_GS
@@ -298,6 +310,17 @@ config X86_MPPARSE
For old smp systems that do not have proper acpi support. Newer systems
(esp with 64bit cpus) with acpi support, MADT and DSDT will override it
+config X86_XEN
+ bool "Xen-compatible"
+ select XEN
+ select X86_PAE
+ select X86_UP_APIC if !SMP && XEN_PRIVILEGED_GUEST
+ select X86_UP_IOAPIC if !SMP && XEN_PRIVILEGED_GUEST
+ select SWIOTLB
+ help
+ Choose this option if you plan to run this kernel on top of the
+ Xen Hypervisor.
+
config X86_BIGSMP
bool "Support for big SMP systems with more than 8 CPUs"
depends on X86_32 && SMP
@@ -327,6 +350,13 @@ config X86_EXTENDED_PLATFORM
generic distribution kernel, say Y here - otherwise say N.
endif
+config X86_64_XEN
+ bool "Enable Xen compatible kernel"
+ select XEN
+ select SWIOTLB
+ help
+ This option will compile a kernel compatible with Xen hypervisor
+
if X86_64
config X86_EXTENDED_PLATFORM
bool "Support for extended (non-PC) x86 platforms"
@@ -639,6 +669,7 @@ source "arch/x86/Kconfig.cpu"
config HPET_TIMER
def_bool X86_64
prompt "HPET Timer Support" if X86_32
+ depends on !X86_XEN && !X86_64_XEN
---help---
Use the IA-PC HPET (High Precision Event Timer) to manage
time in preference to the PIT and RTC, if a HPET is
@@ -674,7 +705,7 @@ config GART_IOMMU
bool "GART IOMMU support" if EMBEDDED
default y
select SWIOTLB
- depends on X86_64 && PCI
+ depends on X86_64 && PCI && !X86_64_XEN
---help---
Support for full DMA access of devices with 32bit memory access only
on systems with more than 3GB. This is usually needed for USB,
@@ -689,7 +720,7 @@ config GART_IOMMU
config CALGARY_IOMMU
bool "IBM Calgary IOMMU support"
select SWIOTLB
- depends on X86_64 && PCI && EXPERIMENTAL
+ depends on X86_64 && PCI && !X86_64_XEN && EXPERIMENTAL
---help---
Support for hardware IOMMUs in IBM's xSeries x366 and x460
systems. Needed to run systems with more than 3GB of memory
@@ -773,6 +804,7 @@ config NR_CPUS
default "1" if !SMP
default "4096" if MAXSMP
default "32" if SMP && (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP || X86_ES7000)
+ default "16" if X86_64_XEN
default "8" if SMP
---help---
This allows you to specify the maximum number of CPUs which this
@@ -804,7 +836,7 @@ source "kernel/Kconfig.preempt"
config X86_UP_APIC
bool "Local APIC support on uniprocessors"
- depends on X86_32 && !SMP && !X86_32_NON_STANDARD
+ depends on X86_32 && !SMP && !X86_32_NON_STANDARD && !XEN_UNPRIVILEGED_GUEST
---help---
A local APIC (Advanced Programmable Interrupt Controller) is an
integrated interrupt controller in the CPU. If you have a single-CPU
@@ -830,15 +862,22 @@ config X86_UP_IOAPIC
config X86_LOCAL_APIC
def_bool y
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC
+ depends on !XEN_UNPRIVILEGED_GUEST
config X86_IO_APIC
def_bool y
depends on X86_64 || SMP || X86_32_NON_STANDARD || X86_UP_APIC
+ depends on !XEN_UNPRIVILEGED_GUEST
config X86_VISWS_APIC
def_bool y
depends on X86_32 && X86_VISWS
+config X86_XEN_GENAPIC
+ bool
+ depends on X86_64_XEN
+ default y
+
config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
bool "Reroute for broken boot IRQs"
default n
@@ -865,6 +904,7 @@ config X86_REROUTE_FOR_BROKEN_BOOT_IRQS
config X86_MCE
bool "Machine Check / overheating reporting"
+ depends on !X86_XEN && !XEN_UNPRIVILEGED_GUEST
---help---
Machine Check support allows the processor to notify the
kernel if it detects a problem (e.g. overheating, data corruption).
@@ -874,7 +914,7 @@ config X86_MCE
config X86_MCE_INTEL
def_bool y
prompt "Intel MCE features"
- depends on X86_MCE && X86_LOCAL_APIC
+ depends on X86_MCE && X86_LOCAL_APIC && !XEN
---help---
Additional support for intel specific MCE features such as
the thermal monitor.
@@ -890,7 +930,7 @@ config X86_MCE_XEON75XX
config X86_MCE_AMD
def_bool y
prompt "AMD MCE features"
- depends on X86_MCE && X86_LOCAL_APIC
+ depends on X86_MCE && X86_LOCAL_APIC && !XEN
---help---
Additional support for AMD specific MCE features such as
the DRAM Error Threshold.
@@ -917,6 +957,10 @@ config X86_MCE_INJECT
If you don't know what a machine check is and you don't do kernel
QA it is safe to say n.
+config X86_XEN_MCE
+ def_bool y
+ depends on XEN && X86_MCE
+
config X86_THERMAL_VECTOR
def_bool y
depends on X86_MCE_INTEL
@@ -969,7 +1013,7 @@ config I8K
config X86_REBOOTFIXUPS
bool "Enable X86 board specific fixups for reboot"
- depends on X86_32
+ depends on X86_32 && !X86_XEN
---help---
This enables chipset and/or board specific fixups to be done
in order to get reboot to work correctly. This is only needed on
@@ -986,6 +1030,7 @@ config X86_REBOOTFIXUPS
config MICROCODE
tristate "/dev/cpu/microcode - microcode support"
+ depends on !XEN_UNPRIVILEGED_GUEST
select FW_LOADER
---help---
If you say Y here, you will be able to update the microcode on
@@ -1176,7 +1221,7 @@ config DIRECT_GBPAGES
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
- depends on SMP
+ depends on SMP && !XEN
depends on X86_64 || (X86_32 && HIGHMEM64G && (X86_NUMAQ || X86_BIGSMP || X86_SUMMIT && ACPI) && EXPERIMENTAL)
default y if (X86_NUMAQ || X86_SUMMIT || X86_BIGSMP)
---help---
@@ -1285,6 +1330,7 @@ config ARCH_SPARSEMEM_DEFAULT
config ARCH_SPARSEMEM_ENABLE
def_bool y
depends on X86_64 || NUMA || (EXPERIMENTAL && X86_32) || X86_32_NON_STANDARD
+ depends on !XEN
select SPARSEMEM_STATIC if X86_32
select SPARSEMEM_VMEMMAP_ENABLE if X86_64
@@ -1360,6 +1406,7 @@ config X86_RESERVE_LOW_64K
config MATH_EMULATION
bool
prompt "Math emulation" if X86_32
+ depends on !X86_XEN
---help---
Linux can emulate a math coprocessor (used for floating point
operations) if you don't have one. 486DX and Pentium processors have
@@ -1385,6 +1432,7 @@ config MATH_EMULATION
config MTRR
bool "MTRR (Memory Type Range Register) support"
+ depends on !XEN_UNPRIVILEGED_GUEST
---help---
On Intel P6 family processors (Pentium Pro, Pentium II and later)
the Memory Type Range Registers (MTRRs) may be used to control
@@ -1469,7 +1517,7 @@ config ARCH_USES_PG_UNCACHED
config EFI
bool "EFI runtime service support"
- depends on ACPI
+ depends on ACPI && !XEN
---help---
This enables the kernel to use EFI runtime services that are
available (such as the EFI variable services).
@@ -1529,6 +1577,7 @@ source kernel/Kconfig.hz
config KEXEC
bool "kexec system call"
+ depends on !XEN_UNPRIVILEGED_GUEST
---help---
kexec is a system call that implements the ability to shutdown your
current kernel, and to start another kernel. It is like a reboot
@@ -1546,6 +1595,7 @@ config KEXEC
config CRASH_DUMP
bool "kernel crash dumps"
depends on X86_64 || (X86_32 && HIGHMEM)
+ depends on !XEN
---help---
Generate crash dump after being started by kexec.
This should be normally only set in special crash dump kernels
@@ -1666,6 +1716,7 @@ config COMPAT_VDSO
def_bool y
prompt "Compat VDSO support"
depends on X86_32 || IA32_EMULATION
+ depends on !X86_XEN
---help---
Map the 32-bit VDSO to the predictable old-style address too.
---help---
@@ -1735,6 +1786,7 @@ config HAVE_ARCH_EARLY_PFN_TO_NID
depends on NUMA
menu "Power management and ACPI options"
+ depends on !XEN_UNPRIVILEGED_GUEST
config ARCH_HIBERNATION_HEADER
def_bool y
@@ -1753,7 +1805,7 @@ config X86_APM_BOOT
menuconfig APM
tristate "APM (Advanced Power Management) BIOS support"
- depends on X86_32 && PM_SLEEP
+ depends on X86_32 && PM_SLEEP && !XEN
---help---
APM is a BIOS specification for saving power using several different
techniques. This is mostly useful for battery powered laptops with
@@ -1914,6 +1966,7 @@ choice
config PCI_GOBIOS
bool "BIOS"
+ depends on !X86_XEN
config PCI_GOMMCONFIG
bool "MMConfig"
@@ -1925,6 +1978,13 @@ config PCI_GOOLPC
bool "OLPC"
depends on OLPC
+config PCI_GOXEN_FE
+ bool "Xen PCI Frontend"
+ depends on X86_XEN
+ help
+ The PCI device frontend driver allows the kernel to import arbitrary
+ PCI devices from a PCI backend to support PCI driver domains.
+
config PCI_GOANY
bool "Any"
@@ -1932,7 +1992,7 @@ endchoice
config PCI_BIOS
def_bool y
- depends on X86_32 && PCI && (PCI_GOBIOS || PCI_GOANY)
+ depends on X86_32 && PCI && !XEN && (PCI_GOBIOS || PCI_GOANY)
# x86-64 doesn't support PCI BIOS access from long mode so always go direct.
config PCI_DIRECT
@@ -1955,6 +2015,22 @@ config PCI_MMCONFIG
bool "Support mmconfig PCI config space access"
depends on X86_64 && PCI && ACPI
+config XEN_PCIDEV_FRONTEND
+ bool "Xen PCI Frontend" if X86_64
+ depends on PCI && ((X86_XEN && (PCI_GOXEN_FE || PCI_GOANY)) || X86_64_XEN)
+ select HOTPLUG
+ default y
+ help
+ The PCI device frontend driver allows the kernel to import arbitrary
+ PCI devices from a PCI backend to support PCI driver domains.
+
+config XEN_PCIDEV_FE_DEBUG
+ bool "Xen PCI Frontend Debugging"
+ depends on XEN_PCIDEV_FRONTEND
+ default n
+ help
+ Enables some debug statements within the PCI Frontend.
+
config DMAR
bool "Support for DMA Remapping Devices (EXPERIMENTAL)"
depends on PCI_MSI && ACPI && EXPERIMENTAL
@@ -2017,6 +2093,7 @@ if X86_32
config ISA
bool "ISA support"
+ depends on !XEN
---help---
Find out whether you have ISA slots on your motherboard. ISA is the
name of a bus system, i.e. the way the CPU talks to the other stuff
@@ -2044,6 +2121,7 @@ source "drivers/eisa/Kconfig"
config MCA
bool "MCA support"
+ depends on !XEN
---help---
MicroChannel Architecture is found in some IBM PS/2 machines and
laptops. It is a bus system similar to PCI or ISA. See
@@ -2157,4 +2235,6 @@ source "crypto/Kconfig"
source "arch/x86/kvm/Kconfig"
+source "drivers/xen/Kconfig"
+
source "lib/Kconfig"
--- sle11sp1-2010-03-22.orig/arch/x86/Kconfig.cpu 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/Kconfig.cpu 2009-12-04 10:44:40.000000000 +0100
@@ -340,7 +340,7 @@ config X86_PPRO_FENCE
config X86_F00F_BUG
def_bool y
- depends on M586MMX || M586TSC || M586 || M486 || M386
+ depends on (M586MMX || M586TSC || M586 || M486 || M386) && !X86_NO_IDT
config X86_WP_WORKS_OK
def_bool y
@@ -397,6 +397,7 @@ config X86_P6_NOP
config X86_TSC
def_bool y
depends on ((MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) && !X86_NUMAQ) || X86_64
+ depends on !XEN
config X86_CMPXCHG64
def_bool y
--- sle11sp1-2010-03-22.orig/arch/x86/Kconfig.debug 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/Kconfig.debug 2009-12-04 10:44:40.000000000 +0100
@@ -136,7 +136,7 @@ config 4KSTACKS
config DOUBLEFAULT
default y
bool "Enable doublefault exception handler" if EMBEDDED
- depends on X86_32
+ depends on X86_32 && !X86_NO_TSS
---help---
This option allows trapping of rare doublefault exceptions that
would otherwise cause a system to silently reboot. Disabling this
--- sle11sp1-2010-03-22.orig/drivers/acpi/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/acpi/Kconfig 2009-12-04 10:44:40.000000000 +0100
@@ -307,6 +307,7 @@ config ACPI_PCI_SLOT
config X86_PM_TIMER
bool "Power Management Timer Support" if EMBEDDED
depends on X86
+ depends on !XEN
default y
help
The Power Management Timer is available on all ACPI-capable,
@@ -360,4 +361,13 @@ config ACPI_SBS
To compile this driver as a module, choose M here:
the modules will be called sbs and sbshc.
+config ACPI_PV_SLEEP
+ bool
+ depends on X86 && XEN && ACPI_SLEEP
+ default y
+
+config PROCESSOR_EXTERNAL_CONTROL
+ bool
+ depends on (X86 || IA64) && XEN
+ default y
endif # ACPI
--- sle11sp1-2010-03-22.orig/drivers/char/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/char/Kconfig 2009-12-04 10:44:40.000000000 +0100
@@ -1052,7 +1052,7 @@ config MAX_RAW_DEVS
config HPET
bool "HPET - High Precision Event Timer" if (X86 || IA64)
default n
- depends on ACPI
+ depends on ACPI && !XEN
help
If you say Y here, you will have a miscdevice named "/dev/hpet/". Each
open selects one of the timers supported by the HPET. The timers are
--- sle11sp1-2010-03-22.orig/drivers/char/tpm/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/char/tpm/Kconfig 2009-12-04 10:44:40.000000000 +0100
@@ -58,4 +58,13 @@ config TCG_INFINEON
Further information on this driver and the supported hardware
can be found at http://www.prosec.rub.de/tpm
+config TCG_XEN
+ tristate "XEN TPM Interface"
+ depends on XEN
+ ---help---
+ If you want to make TPM support available to a Xen user domain,
+ say Yes and it will be accessible from within Linux.
+ To compile this driver as a module, choose M here; the module
+ will be called tpm_xenu.
+
endif # TCG_TPM
--- sle11sp1-2010-03-22.orig/drivers/cpufreq/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/cpufreq/Kconfig 2009-12-04 10:44:40.000000000 +0100
@@ -1,5 +1,6 @@
config CPU_FREQ
bool "CPU Frequency scaling"
+ depends on !PROCESSOR_EXTERNAL_CONTROL
help
CPU Frequency scaling allows you to change the clock speed of
CPUs on the fly. This is a nice method to save power, because
--- sle11sp1-2010-03-22.orig/drivers/serial/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/serial/Kconfig 2009-12-04 10:44:40.000000000 +0100
@@ -9,6 +9,7 @@ menu "Serial drivers"
# The new 8250/16550 serial drivers
config SERIAL_8250
tristate "8250/16550 and compatible serial support"
+ depends on !XEN_DISABLE_SERIAL
select SERIAL_CORE
---help---
This selects whether you want to include the driver for the standard
--- sle11sp1-2010-03-22.orig/drivers/xen/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/drivers/xen/Kconfig 2009-12-04 10:44:40.000000000 +0100
@@ -1,6 +1,354 @@
+#
+# This Kconfig describe xen options
+#
+
+mainmenu "Xen Configuration"
+
+config XEN
+ bool
+
+if XEN
+config XEN_INTERFACE_VERSION
+ hex
+ default 0x00030207
+
+menu "XEN"
+
+config XEN_PRIVILEGED_GUEST
+ bool "Privileged Guest (domain 0)"
+ select PCI_REASSIGN if PCI
+ help
+ Support for privileged operation (domain 0)
+
+config XEN_UNPRIVILEGED_GUEST
+ def_bool !XEN_PRIVILEGED_GUEST
+
+config XEN_PRIVCMD
+ def_bool y
+ depends on PROC_FS
+
+config XEN_XENBUS_DEV
+ def_bool y
+ depends on PROC_FS
+
+config XEN_NETDEV_ACCEL_SFC_UTIL
+ depends on X86
+ tristate
+
+config XEN_BACKEND
+ tristate "Backend driver support"
+ default XEN_PRIVILEGED_GUEST
+ help
+ Support for backend device drivers that provide I/O services
+ to other virtual machines.
+
+config XEN_BLKDEV_BACKEND
+ tristate "Block-device backend driver"
+ depends on XEN_BACKEND
+ default XEN_BACKEND
+ help
+ The block-device backend driver allows the kernel to export its
+ block devices to other guests via a high-performance shared-memory
+ interface.
+
+config XEN_BLKDEV_TAP
+ tristate "Block-device tap backend driver"
+ depends on XEN_BACKEND
+ default XEN_BACKEND
+ help
+ The block tap driver is an alternative to the block back driver
+ and allows VM block requests to be redirected to userspace through
+ a device interface. The tap allows user-space development of
+ high-performance block backends, where disk images may be implemented
+ as files, in memory, or on other hosts across the network. This
+ driver can safely coexist with the existing blockback driver.
+
+config XEN_BLKDEV_TAP2
+ tristate "Block-device tap backend driver 2"
+ depends on XEN_BACKEND
+ default XEN_BACKEND
+ help
+ The block tap driver is an alternative to the block back driver
+ and allows VM block requests to be redirected to userspace through
+ a device interface. The tap allows user-space development of
+ high-performance block backends, where disk images may be implemented
+ as files, in memory, or on other hosts across the network. This
+ driver can safely coexist with the existing blockback driver.
+
+config XEN_BLKBACK_PAGEMAP
+ tristate
+ depends on XEN_BLKDEV_BACKEND != n && XEN_BLKDEV_TAP2 != n
+ default XEN_BLKDEV_BACKEND || XEN_BLKDEV_TAP2
+
+config XEN_NETDEV_BACKEND
+ tristate "Network-device backend driver"
+ depends on XEN_BACKEND && NET
+ default XEN_BACKEND
+ help
+ The network-device backend driver allows the kernel to export its
+ network devices to other guests via a high-performance shared-memory
+ interface.
+
+config XEN_NETDEV_PIPELINED_TRANSMITTER
+ bool "Pipelined transmitter (DANGEROUS)"
+ depends on XEN_NETDEV_BACKEND
+ help
+ If the net backend is a dumb domain, such as a transparent Ethernet
+ bridge with no local IP interface, it is safe to say Y here to get
+ slightly lower network overhead.
+ If the backend has a local IP interface; or may be doing smart things
+ like reassembling packets to perform firewall filtering; or if you
+ are unsure; or if you experience network hangs when this option is
+ enabled; then you must say N here.
+
+config XEN_NETDEV_ACCEL_SFC_BACKEND
+ tristate "Network-device backend driver acceleration for Solarflare NICs"
+ depends on XEN_NETDEV_BACKEND && SFC && SFC_RESOURCE && X86
+ select XEN_NETDEV_ACCEL_SFC_UTIL
+ default m
+
+config XEN_NETDEV_LOOPBACK
+ tristate "Network-device loopback driver"
+ depends on XEN_NETDEV_BACKEND
+ help
+ A two-interface loopback device to emulate a local netfront-netback
+ connection. If unsure, it is probably safe to say N here.
+
+config XEN_PCIDEV_BACKEND
+ tristate "PCI-device backend driver"
+ depends on PCI && XEN_BACKEND
+ default XEN_BACKEND
+ help
+ The PCI device backend driver allows the kernel to export arbitrary
+ PCI devices to other guests. If you select this to be a module, you
+ will need to make sure no other driver has bound to the device(s)
+ you want to make visible to other guests.
+
+choice
+ prompt "PCI Backend Mode"
+ depends on XEN_PCIDEV_BACKEND
+ default XEN_PCIDEV_BACKEND_VPCI if !IA64
+ default XEN_PCIDEV_BACKEND_CONTROLLER if IA64
+
+config XEN_PCIDEV_BACKEND_VPCI
+ bool "Virtual PCI"
+ ---help---
+ This PCI Backend hides the true PCI topology and makes the frontend
+ think there is a single PCI bus with only the exported devices on it.
+ For example, a device at 03:05.0 will be re-assigned to 00:00.0. A
+ second device at 02:1a.1 will be re-assigned to 00:01.1.
+
+config XEN_PCIDEV_BACKEND_PASS
+ bool "Passthrough"
+ ---help---
+ This PCI Backend provides a real view of the PCI topology to the
+ frontend (for example, a device at 06:01.b will still appear at
+ 06:01.b to the frontend). This is similar to how Xen 2.0.x exposed
+ PCI devices to its driver domains. This may be required for drivers
+ which depend on finding their hardward in certain bus/slot
+ locations.
+
+config XEN_PCIDEV_BACKEND_SLOT
+ bool "Slot"
+ ---help---
+ This PCI Backend hides the true PCI topology and makes the frontend
+ think there is a single PCI bus with only the exported devices on it.
+ Contrary to the virtual PCI backend, a function becomes a new slot.
+ For example, a device at 03:05.2 will be re-assigned to 00:00.0. A
+ second device at 02:1a.1 will be re-assigned to 00:01.0.
+
+config XEN_PCIDEV_BACKEND_CONTROLLER
+ bool "Controller"
+ depends on IA64
+ ---help---
+ This PCI backend virtualizes the PCI bus topology by providing a
+ virtual bus per PCI root device. Devices which are physically under
+ the same root bus will appear on the same virtual bus. For systems
+ with complex I/O addressing, this is the only backend which supports
+ extended I/O port spaces and MMIO translation offsets. This backend
+ also supports slot virtualization. For example, a device at
+ 0000:01:02.1 will be re-assigned to 0000:00:00.0. A second device
+ at 0000:02:05.0 (behind a P2P bridge on bus 0000:01) will be
+ re-assigned to 0000:00:01.0. A third device at 0000:16:05.0 (under
+ a different PCI root bus) will be re-assigned to 0000:01:00.0.
+
+endchoice
+
+config XEN_PCIDEV_BE_DEBUG
+ bool "PCI Backend Debugging"
+ depends on XEN_PCIDEV_BACKEND
+
+config XEN_TPMDEV_BACKEND
+ tristate "TPM-device backend driver"
+ depends on XEN_BACKEND
+ help
+ The TPM-device backend driver
+
+config XEN_SCSI_BACKEND
+ tristate "SCSI backend driver"
+ depends on SCSI && XEN_BACKEND
+ default m
+ help
+ The SCSI backend driver allows the kernel to export its SCSI Devices
+ to other guests via a high-performance shared-memory interface.
+
+config XEN_USB_BACKEND
+ tristate "USB backend driver"
+ depends on USB && XEN_BACKEND
+ default m
+ help
+ The USB backend driver allows the kernel to export its USB Devices
+ to other guests.
+
+config XEN_BLKDEV_FRONTEND
+ tristate "Block-device frontend driver"
+ default y
+ help
+ The block-device frontend driver allows the kernel to access block
+ devices mounted within another guest OS. Unless you are building a
+ dedicated device-driver domain, or your master control domain
+ (domain 0), then you almost certainly want to say Y here.
+
+config XEN_NETDEV_FRONTEND
+ tristate "Network-device frontend driver"
+ depends on NET
+ default y
+ help
+ The network-device frontend driver allows the kernel to access
+ network interfaces within another guest OS. Unless you are building a
+ dedicated device-driver domain, or your master control domain
+ (domain 0), then you almost certainly want to say Y here.
+
+config XEN_NETDEV_ACCEL_SFC_FRONTEND
+ tristate "Network-device frontend driver acceleration for Solarflare NICs"
+ depends on XEN_NETDEV_FRONTEND && X86
+ select XEN_NETDEV_ACCEL_SFC_UTIL
+ default m
+
+config XEN_SCSI_FRONTEND
+ tristate "SCSI frontend driver"
+ depends on SCSI
+ default m
+ help
+ The SCSI frontend driver allows the kernel to access SCSI Devices
+ within another guest OS.
+
+config XEN_USB_FRONTEND
+ tristate "USB frontend driver"
+ depends on USB
+ default m
+ help
+ The USB frontend driver allows the kernel to access USB Devices
+ within another guest OS.
+
+config XEN_USB_FRONTEND_HCD_STATS
+ bool "Taking the HCD statistics (for debug)"
+ depends on XEN_USB_FRONTEND
+ default y
+ help
+ Count the transferred urb status and the RING_FULL occurrence.
+
+config XEN_USB_FRONTEND_HCD_PM
+ bool "HCD suspend/resume support (DO NOT USE)"
+ depends on XEN_USB_FRONTEND
+ default n
+ help
+ Experimental bus suspend/resume feature support.
+
+config XEN_GRANT_DEV
+ tristate "User-space granted page access driver"
+ default XEN_PRIVILEGED_GUEST
+ help
+ Device for accessing (in user-space) pages that have been granted
+ by other domains.
+
+config XEN_FRAMEBUFFER
+ tristate "Framebuffer-device frontend driver"
+ depends on FB
+ select FB_CFB_FILLRECT
+ select FB_CFB_COPYAREA
+ select FB_CFB_IMAGEBLIT
+ default y
+ help
+ The framebuffer-device frontend drivers allows the kernel to create a
+ virtual framebuffer. This framebuffer can be viewed in another
+ domain. Unless this domain has access to a real video card, you
+ probably want to say Y here.
+
+config XEN_KEYBOARD
+ tristate "Keyboard-device frontend driver"
+ depends on XEN_FRAMEBUFFER && INPUT
+ default y
+ help
+ The keyboard-device frontend driver allows the kernel to create a
+ virtual keyboard. This keyboard can then be driven by another
+ domain. If you've said Y to CONFIG_XEN_FRAMEBUFFER, you probably
+ want to say Y here.
+
+config XEN_DISABLE_SERIAL
+ bool "Disable serial port drivers"
+ default y
+ help
+ Disable serial port drivers, allowing the Xen console driver
+ to provide a serial console at ttyS0.
+
+config XEN_SYSFS
+ tristate "Export Xen attributes in sysfs"
+ depends on SYSFS
+ select SYS_HYPERVISOR
+ default y
+ help
+ Xen hypervisor attributes will show up under /sys/hypervisor/.
+
+choice
+ prompt "Xen version compatibility"
+ default XEN_COMPAT_030002_AND_LATER
+
+ config XEN_COMPAT_030002_AND_LATER
+ bool "3.0.2 and later"
+
+ config XEN_COMPAT_030004_AND_LATER
+ bool "3.0.4 and later"
+
+ config XEN_COMPAT_030100_AND_LATER
+ bool "3.1.0 and later"
+
+ config XEN_COMPAT_LATEST_ONLY
+ bool "no compatibility code"
+
+endchoice
+
+config XEN_COMPAT
+ hex
+ default 0xffffff if XEN_COMPAT_LATEST_ONLY
+ default 0x030100 if XEN_COMPAT_030100_AND_LATER
+ default 0x030004 if XEN_COMPAT_030004_AND_LATER
+ default 0x030002 if XEN_COMPAT_030002_AND_LATER
+ default 0
+
+endmenu
+
+config HAVE_IRQ_IGNORE_UNHANDLED
+ def_bool y
+
+config NO_IDLE_HZ
+ def_bool y
+
+config XEN_SMPBOOT
+ def_bool y
+ depends on SMP && !PPC_XEN
+
+config XEN_XENCOMM
+ bool
+
+config XEN_DEVMEM
+ def_bool y
+
+endif
+
config XEN_BALLOON
- bool "Xen memory balloon driver"
- depends on XEN
+ bool "Xen memory balloon driver" if PARAVIRT_XEN
+ depends on (XEN && !PPC_XEN) || PARAVIRT_XEN
default y
help
The balloon driver allows the Xen domain to request more memory from
@@ -8,14 +356,16 @@ config XEN_BALLOON
return unneeded memory to the system.
config XEN_SCRUB_PAGES
- bool "Scrub pages before returning them to system"
- depends on XEN_BALLOON
+ bool "Scrub memory before freeing it to Xen"
+ depends on XEN || XEN_BALLOON
default y
help
- Scrub pages before returning them to the system for reuse by
- other domains. This makes sure that any confidential data
- is not accidentally visible to other domains. Is it more
- secure, but slightly less efficient.
+ Erase memory contents before freeing it back to Xen's global
+ pool. This ensures that any secrets contained within that
+ memory (e.g., private keys) cannot be found by other guests that
+ may be running on the machine. Most people will want to say Y here.
+ If security is not a concern then you may increase performance by
+ saying N.
If in doubt, say yes.
config XEN_DEV_EVTCHN
--- sle11sp1-2010-03-22.orig/fs/Kconfig 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/fs/Kconfig 2009-12-04 10:44:40.000000000 +0100
@@ -160,6 +160,7 @@ config HUGETLBFS
bool "HugeTLB file system support"
depends on X86 || IA64 || SPARC64 || (S390 && 64BIT) || \
SYS_SUPPORTS_HUGETLBFS || BROKEN
+ depends on !XEN
help
hugetlbfs is a filesystem backing for HugeTLB pages, based on
ramfs. For architectures that support it, say Y here and read
--- sle11sp1-2010-03-22.orig/kernel/Kconfig.preempt 2010-03-22 12:07:55.000000000 +0100
+++ sle11sp1-2010-03-22/kernel/Kconfig.preempt 2009-12-04 10:44:40.000000000 +0100
@@ -36,6 +36,7 @@ config PREEMPT_VOLUNTARY
config PREEMPT
bool "Preemptible Kernel (Low-Latency Desktop)"
+ depends on !XEN
help
This option reduces the latency of the kernel by making
all kernel code (that is not executing in a critical section)

View file

@ -0,0 +1,89 @@
From: Jack Steiner <steiner@sgi.com>
Subject: x86: UV SGI: Don't track GRU space in PAT
References: bnc#561933, fate#306952
Patch-mainline: 2.6.33-rc1
Git-commit: fd12a0d69aee6d90fa9b9890db24368a897f8423
Commit fd12a0d69aee6d90fa9b9890db24368a897f8423 upstream.
GRU space is always mapped as WB in the page table. There is
no need to track the mappings in the PAT. This also eliminates
the "freeing invalid memtype" messages when the GRU space is unmapped.
Version 2 with changes suggested by Ingo (at least I think I understood what
he wanted).
Version 3 with changes suggested by Peter to make the new function
a member of the x86_platform structure.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Rafael J. Wysocki <rjw@suse.de>
Automatically created from "patches.arch/bug-561933_uv_pat_is_gru_range.patch" by xen-port-patches.py
--- sle11sp1-2010-03-22.orig/arch/x86/kernel/x86_init-xen.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/kernel/x86_init-xen.c 2009-12-16 12:13:32.000000000 +0100
@@ -13,6 +13,7 @@
#include <asm/e820.h>
#include <asm/time.h>
#include <asm/irq.h>
+#include <asm/pat.h>
void __cpuinit x86_init_noop(void) { }
void __init x86_init_uint_noop(unsigned int unused) { }
@@ -64,6 +65,7 @@ struct x86_init_ops x86_init __initdata
};
struct x86_platform_ops x86_platform = {
+ .is_untracked_pat_range = default_is_untracked_pat_range,
.calibrate_tsc = NULL,
.get_wallclock = mach_get_cmos_time,
.set_wallclock = mach_set_rtc_mmss,
--- sle11sp1-2010-03-22.orig/arch/x86/mm/pat-xen.c 2010-03-22 12:52:42.000000000 +0100
+++ sle11sp1-2010-03-22/arch/x86/mm/pat-xen.c 2010-03-22 12:52:58.000000000 +0100
@@ -20,6 +20,7 @@
#include <asm/cacheflush.h>
#include <asm/processor.h>
#include <asm/tlbflush.h>
+#include <asm/x86_init.h>
#include <asm/pgtable.h>
#include <asm/fcntl.h>
#include <asm/e820.h>
@@ -372,6 +373,11 @@ static int free_ram_pages_type(u64 start
return 0;
}
+int default_is_untracked_pat_range(u64 start, u64 end)
+{
+ return is_ISA_range(start, end);
+}
+
/*
* req_type typically has one of the:
* - _PAGE_CACHE_WB
@@ -412,7 +418,7 @@ int reserve_memtype(u64 start, u64 end,
}
/* Low ISA region is always mapped WB in page table. No need to track */
- if (is_ISA_range(start, end - 1)) {
+ if (x86_platform.is_untracked_pat_range(start, end - 1)) {
if (new_type)
*new_type = _PAGE_CACHE_WB;
return 0;
@@ -521,7 +527,7 @@ int free_memtype(u64 start, u64 end)
return 0;
/* Low ISA region is always mapped WB. No need to track */
- if (is_ISA_range(start, end - 1))
+ if (x86_platform.is_untracked_pat_range(start, end - 1))
return 0;
is_range_ram = pat_pagerange_is_ram(start, end);
@@ -603,7 +609,7 @@ static unsigned long lookup_memtype(u64
int rettype = _PAGE_CACHE_WB;
struct memtype *entry;
- if (is_ISA_range(paddr, paddr + PAGE_SIZE - 1))
+ if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE - 1))
return rettype;
if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) {

View file

@ -0,0 +1,55 @@
From: Lin Ming <ming.m.lin@intel.com>
Subject: timekeeping: Fix clock_gettime vsyscall time warp
Patch-mainline: 0696b711e4be45fa104c12329f617beb29c03f78
References: bnc#569238
commit 0696b711e4be45fa104c12329f617beb29c03f78
Author: Lin Ming <ming.m.lin@intel.com>
Date: Tue Nov 17 13:49:50 2009 +0800
timekeeping: Fix clock_gettime vsyscall time warp
Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
to struct timekeeper" the clock multiplier of vsyscall is updated with
the unmodified clock multiplier of the clock source and not with the
NTP adjusted multiplier of the timekeeper.
This causes user space observerable time warps:
new CLOCK-warp maximum: 120 nsecs, 00000025c337c537 -> 00000025c337c4bf
Add a new argument "mult" to update_vsyscall() and hand in the
timekeeping internal NTP adjusted multiplier.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kurt Garloff <garloff@suse.de>
Automatically created from "patches.fixes/fix_clock_gettime_vsyscall_time_warp.diff" by xen-port-patches.py
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/vsyscall_64-xen.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/vsyscall_64-xen.c 2010-03-01 14:44:50.000000000 +0100
@@ -73,7 +73,8 @@ void update_vsyscall_tz(void)
write_sequnlock_irqrestore(&vsyscall_gtod_data.lock, flags);
}
-void update_vsyscall(struct timespec *wall_time, struct clocksource *clock)
+void update_vsyscall(struct timespec *wall_time, struct clocksource *clock,
+ u32 mult)
{
unsigned long flags;
@@ -82,7 +83,7 @@ void update_vsyscall(struct timespec *wa
vsyscall_gtod_data.clock.vread = clock->vread;
vsyscall_gtod_data.clock.cycle_last = clock->cycle_last;
vsyscall_gtod_data.clock.mask = clock->mask;
- vsyscall_gtod_data.clock.mult = clock->mult;
+ vsyscall_gtod_data.clock.mult = mult;
vsyscall_gtod_data.clock.shift = clock->shift;
vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec;
vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec;

55
xen3-fixup-arch-x86 Normal file
View file

@ -0,0 +1,55 @@
Subject: xen3 x86 build fixes.
From: jbeulich@novell.com
Patch-mainline: obsolete
--- head-2010-01-18.orig/arch/x86/include/asm/topology.h 2010-01-18 15:20:21.000000000 +0100
+++ head-2010-01-18/arch/x86/include/asm/topology.h 2009-10-15 11:04:35.000000000 +0200
@@ -30,7 +30,7 @@
# define ENABLE_TOPO_DEFINES
# endif
#else
-# ifdef CONFIG_SMP
+# if defined(CONFIG_SMP) && !defined(CONFIG_XEN)
# define ENABLE_TOPO_DEFINES
# endif
#endif
--- head-2010-01-18.orig/arch/x86/kernel/cpu/intel_cacheinfo.c 2010-01-18 15:20:21.000000000 +0100
+++ head-2010-01-18/arch/x86/kernel/cpu/intel_cacheinfo.c 2010-01-18 16:16:46.000000000 +0100
@@ -502,7 +502,7 @@ unsigned int __cpuinit init_intel_cachei
static DEFINE_PER_CPU(struct _cpuid4_info *, cpuid4_info);
#define CPUID4_INFO_IDX(x, y) (&((per_cpu(cpuid4_info, x))[y]))
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && !defined(CONFIG_XEN)
static void __cpuinit cache_shared_cpu_map_setup(unsigned int cpu, int index)
{
struct _cpuid4_info *this_leaf, *sibling_leaf;
--- head-2010-01-18.orig/arch/x86/power/Makefile 2010-01-18 15:20:21.000000000 +0100
+++ head-2010-01-18/arch/x86/power/Makefile 2009-10-12 15:43:36.000000000 +0200
@@ -5,3 +5,5 @@ CFLAGS_cpu.o := $(nostackp)
obj-$(CONFIG_PM_SLEEP) += cpu.o
obj-$(CONFIG_HIBERNATION) += hibernate_$(BITS).o hibernate_asm_$(BITS).o
+
+disabled-obj-$(CONFIG_XEN) := cpu.o
--- head-2010-01-18.orig/arch/x86/power/cpu.c 2009-12-04 10:44:45.000000000 +0100
+++ head-2010-01-18/arch/x86/power/cpu.c 2009-10-12 15:43:36.000000000 +0200
@@ -125,7 +125,6 @@ static void do_fpu_end(void)
static void fix_processor_context(void)
{
-#ifndef CONFIG_X86_NO_TSS
int cpu = smp_processor_id();
struct tss_struct *t = &per_cpu(init_tss, cpu);
@@ -138,10 +137,7 @@ static void fix_processor_context(void)
#ifdef CONFIG_X86_64
get_cpu_gdt_table(cpu)[GDT_ENTRY_TSS].type = 9;
-#endif
-#endif
-#ifdef CONFIG_X86_64
syscall_init(); /* This sets MSR_*STAR and related */
#endif
load_TR_desc(); /* This does ltr */

413
xen3-fixup-common Normal file
View file

@ -0,0 +1,413 @@
Subject: Fix xen build.
From: jbeulich@novell.com
Patch-mainline: obsolete
--- sle11sp1-2010-03-11.orig/drivers/acpi/acpica/hwsleep.c 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/acpi/acpica/hwsleep.c 2009-11-06 10:45:37.000000000 +0100
@@ -419,6 +419,7 @@ ACPI_EXPORT_SYMBOL(acpi_enter_sleep_stat
* THIS FUNCTION MUST BE CALLED WITH INTERRUPTS DISABLED
*
******************************************************************************/
+#ifndef CONFIG_XEN
acpi_status asmlinkage acpi_enter_sleep_state_s4bios(void)
{
u32 in_value;
@@ -472,6 +473,7 @@ acpi_status asmlinkage acpi_enter_sleep_
}
ACPI_EXPORT_SYMBOL(acpi_enter_sleep_state_s4bios)
+#endif
/*******************************************************************************
*
--- sle11sp1-2010-03-11.orig/drivers/base/cpu.c 2010-03-11 09:10:11.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/base/cpu.c 2010-01-27 14:29:14.000000000 +0100
@@ -112,7 +112,7 @@ static inline void register_cpu_control(
}
#endif /* CONFIG_HOTPLUG_CPU */
-#ifdef CONFIG_KEXEC
+#if defined(CONFIG_KEXEC) && !defined(CONFIG_XEN)
#include <linux/kexec.h>
static ssize_t show_crash_notes(struct sys_device *dev, struct sysdev_attribute *attr,
@@ -251,7 +251,7 @@ int __cpuinit register_cpu(struct cpu *c
if (!error)
register_cpu_under_node(num, cpu_to_node(num));
-#ifdef CONFIG_KEXEC
+#if defined(CONFIG_KEXEC) && !defined(CONFIG_XEN)
if (!error)
error = sysdev_create_file(&cpu->sysdev, &attr_crash_notes);
#endif
--- sle11sp1-2010-03-11.orig/drivers/ide/ide-lib.c 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/ide/ide-lib.c 2009-11-06 10:45:37.000000000 +0100
@@ -18,6 +18,16 @@ void ide_toggle_bounce(ide_drive_t *driv
{
u64 addr = BLK_BOUNCE_HIGH; /* dma64_addr_t */
+#ifndef CONFIG_XEN
+ if (!PCI_DMA_BUS_IS_PHYS) {
+ addr = BLK_BOUNCE_ANY;
+ } else if (on && drive->media == ide_disk) {
+ struct device *dev = drive->hwif->dev;
+
+ if (dev && dev->dma_mask)
+ addr = *dev->dma_mask;
+ }
+#else
if (on && drive->media == ide_disk) {
struct device *dev = drive->hwif->dev;
@@ -26,6 +36,7 @@ void ide_toggle_bounce(ide_drive_t *driv
else if (dev && dev->dma_mask)
addr = *dev->dma_mask;
}
+#endif
if (drive->queue)
blk_queue_bounce_limit(drive->queue, addr);
--- sle11sp1-2010-03-11.orig/drivers/oprofile/buffer_sync.c 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/oprofile/buffer_sync.c 2009-11-06 10:45:37.000000000 +0100
@@ -46,7 +46,9 @@ static cpumask_var_t marked_cpus;
static DEFINE_SPINLOCK(task_mortuary);
static void process_task_mortuary(void);
+#ifdef CONFIG_XEN
static int cpu_current_domain[NR_CPUS];
+#endif
/* Take ownership of the task struct and place it on the
* list for processing. Only after two full buffer syncs
@@ -158,11 +160,13 @@ static void end_sync(void)
int sync_start(void)
{
int err;
+#ifdef CONFIG_XEN
int i;
for (i = 0; i < NR_CPUS; i++) {
cpu_current_domain[i] = COORDINATOR_DOMAIN;
}
+#endif
if (!zalloc_cpumask_var(&marked_cpus, GFP_KERNEL))
return -ENOMEM;
@@ -312,12 +316,14 @@ static void add_cpu_mode_switch(unsigned
}
}
+#ifdef CONFIG_XEN
static void add_domain_switch(unsigned long domain_id)
{
add_event_entry(ESCAPE_CODE);
add_event_entry(DOMAIN_SWITCH_CODE);
add_event_entry(domain_id);
}
+#endif
static void
add_user_ctx_switch(struct task_struct const *task, unsigned long cookie)
@@ -540,10 +546,12 @@ void sync_buffer(int cpu)
add_cpu_switch(cpu);
+#ifdef CONFIG_XEN
/* We need to assign the first samples in this CPU buffer to the
same domain that we were processing at the last sync_buffer */
if (cpu_current_domain[cpu] != COORDINATOR_DOMAIN)
add_domain_switch(cpu_current_domain[cpu]);
+#endif
op_cpu_buffer_reset(cpu);
available = op_cpu_buffer_entries(cpu);
@@ -553,12 +561,14 @@ void sync_buffer(int cpu)
if (!sample)
break;
+#ifdef CONFIG_XEN
if (domain_switch) {
cpu_current_domain[cpu] = sample->eip;
add_domain_switch(sample->eip);
domain_switch = 0;
continue;
}
+#endif
if (is_code(sample->eip)) {
flags = sample->event;
@@ -584,17 +594,21 @@ void sync_buffer(int cpu)
cookie = get_exec_dcookie(mm);
add_user_ctx_switch(new, cookie);
}
+#ifdef CONFIG_XEN
if (flags & DOMAIN_SWITCH)
domain_switch = 1;
+#endif
if (op_cpu_buffer_get_size(&entry))
add_data(&entry, mm);
continue;
}
+#ifdef CONFIG_XEN
if (cpu_current_domain[cpu] != COORDINATOR_DOMAIN) {
add_sample_entry(sample->eip, sample->event);
continue;
}
+#endif
if (state < sb_bt_start)
/* ignore sample */
@@ -611,9 +625,11 @@ void sync_buffer(int cpu)
}
release_mm(mm);
+#ifdef CONFIG_XEN
/* We reset domain to COORDINATOR at each CPU switch */
if (cpu_current_domain[cpu] != COORDINATOR_DOMAIN)
add_domain_switch(COORDINATOR_DOMAIN);
+#endif
mark_done(cpu);
--- sle11sp1-2010-03-11.orig/drivers/oprofile/cpu_buffer.c 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/oprofile/cpu_buffer.c 2009-11-06 10:45:37.000000000 +0100
@@ -58,7 +58,11 @@ static void wq_sync_buffer(struct work_s
#define DEFAULT_TIMER_EXPIRE (HZ / 10)
static int work_enabled;
+#ifndef CONFIG_XEN
+#define current_domain COORDINATOR_DOMAIN
+#else
static int32_t current_domain = COORDINATOR_DOMAIN;
+#endif
unsigned long oprofile_get_cpu_buffer_size(void)
{
@@ -463,6 +467,7 @@ fail:
return;
}
+#ifdef CONFIG_XEN
int oprofile_add_domain_switch(int32_t domain_id)
{
struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[smp_processor_id()];
@@ -481,6 +486,7 @@ int oprofile_add_domain_switch(int32_t d
return 1;
}
+#endif
/*
* This serves to avoid cpu buffer overflow, and makes sure
--- sle11sp1-2010-03-11.orig/drivers/oprofile/oprof.c 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/oprofile/oprof.c 2009-11-06 10:45:37.000000000 +0100
@@ -39,6 +39,7 @@ static DEFINE_MUTEX(start_mutex);
*/
static int timer = 0;
+#ifdef CONFIG_XEN
int oprofile_set_active(int active_domains[], unsigned int adomains)
{
int err;
@@ -64,6 +65,7 @@ int oprofile_set_passive(int passive_dom
mutex_unlock(&start_mutex);
return err;
}
+#endif
int oprofile_setup(void)
{
--- sle11sp1-2010-03-11.orig/drivers/oprofile/oprofile_files.c 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/drivers/oprofile/oprofile_files.c 2009-11-06 10:45:37.000000000 +0100
@@ -171,6 +171,8 @@ static const struct file_operations dump
.write = dump_write,
};
+#ifdef CONFIG_XEN
+
#define TMPBUFSIZE 512
static unsigned int adomains = 0;
@@ -360,6 +362,8 @@ static const struct file_operations pass
.write = pdomain_write,
};
+#endif /* CONFIG_XEN */
+
void oprofile_create_files(struct super_block *sb, struct dentry *root)
{
/* reinitialize default values */
@@ -370,8 +374,10 @@ void oprofile_create_files(struct super_
oprofilefs_create_file(sb, root, "enable", &enable_fops);
oprofilefs_create_file_perm(sb, root, "dump", &dump_fops, 0666);
+#ifdef CONFIG_XEN
oprofilefs_create_file(sb, root, "active_domains", &active_domain_ops);
oprofilefs_create_file(sb, root, "passive_domains", &passive_domain_ops);
+#endif
oprofilefs_create_file(sb, root, "buffer", &event_buffer_fops);
oprofilefs_create_ulong(sb, root, "buffer_size", &oprofile_buffer_size);
oprofilefs_create_ulong(sb, root, "buffer_watershed", &oprofile_buffer_watershed);
--- sle11sp1-2010-03-11.orig/drivers/xen/core/smpboot.c 2009-05-19 09:16:41.000000000 +0200
+++ sle11sp1-2010-03-11/drivers/xen/core/smpboot.c 2009-11-06 10:45:37.000000000 +0100
@@ -57,7 +57,6 @@ u8 cpu_2_logical_apicid[NR_CPUS] = { [0
cpumask_t cpu_sibling_map[NR_CPUS] __cacheline_aligned;
cpumask_t cpu_core_map[NR_CPUS] __cacheline_aligned;
-EXPORT_SYMBOL(cpu_core_map);
#if defined(__i386__)
u8 x86_cpu_to_apicid[NR_CPUS] = { [0 ... NR_CPUS-1] = 0xff };
--- sle11sp1-2010-03-11.orig/include/linux/mm.h 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/include/linux/mm.h 2009-11-06 10:45:37.000000000 +0100
@@ -210,6 +210,7 @@ struct vm_operations_struct {
int (*access)(struct vm_area_struct *vma, unsigned long addr,
void *buf, int len, int write);
+#ifdef CONFIG_XEN
/* Area-specific function for clearing the PTE at @ptep. Returns the
* original value of @ptep. */
pte_t (*zap_pte)(struct vm_area_struct *vma,
@@ -217,6 +218,7 @@ struct vm_operations_struct {
/* called before close() to indicate no more pages should be mapped */
void (*unmap)(struct vm_area_struct *area);
+#endif
#ifdef CONFIG_NUMA
/*
--- sle11sp1-2010-03-11.orig/include/linux/oprofile.h 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/include/linux/oprofile.h 2009-11-06 10:45:37.000000000 +0100
@@ -16,8 +16,9 @@
#include <linux/types.h>
#include <linux/spinlock.h>
#include <asm/atomic.h>
-
+#ifdef CONFIG_XEN
#include <xen/interface/xenoprof.h>
+#endif
/* Each escaped entry is prefixed by ESCAPE_CODE
* then one of the following codes, then the
@@ -55,11 +56,12 @@ struct oprofile_operations {
/* create any necessary configuration files in the oprofile fs.
* Optional. */
int (*create_files)(struct super_block * sb, struct dentry * root);
+#ifdef CONFIG_XEN
/* setup active domains with Xen */
int (*set_active)(int *active_domains, unsigned int adomains);
/* setup passive domains with Xen */
int (*set_passive)(int *passive_domains, unsigned int pdomains);
-
+#endif
/* Do any necessary interrupt setup. Optional. */
int (*setup)(void);
/* Do any necessary interrupt shutdown. Optional. */
--- sle11sp1-2010-03-11.orig/include/linux/page-flags.h 2010-02-17 14:45:18.000000000 +0100
+++ sle11sp1-2010-03-11/include/linux/page-flags.h 2010-02-17 14:45:49.000000000 +0100
@@ -112,7 +112,7 @@ enum pageflags {
#endif
#ifdef CONFIG_XEN
PG_foreign, /* Page is owned by foreign allocator. */
- PG_netback, /* Page is owned by netback */
+ /* PG_netback, Page is owned by netback */
PG_blkback, /* Page is owned by blkback */
#endif
__NR_PAGEFLAGS,
@@ -359,9 +359,11 @@ CLEARPAGEFLAG(Uptodate, uptodate)
#define PageForeignDestructor(_page, order) \
((void (*)(struct page *, unsigned int))(_page)->index)(_page, order)
+#if 0
#define PageNetback(page) test_bit(PG_netback, &(page)->flags)
#define SetPageNetback(page) set_bit(PG_netback, &(page)->flags)
#define ClearPageNetback(page) clear_bit(PG_netback, &(page)->flags)
+#endif
#define PageBlkback(page) test_bit(PG_blkback, &(page)->flags)
#define SetPageBlkback(page) set_bit(PG_blkback, &(page)->flags)
--- sle11sp1-2010-03-11.orig/kernel/kexec.c 2009-12-04 10:44:41.000000000 +0100
+++ sle11sp1-2010-03-11/kernel/kexec.c 2009-11-06 10:45:37.000000000 +0100
@@ -45,8 +45,10 @@
#include <linux/kdb.h>
#endif
+#ifndef CONFIG_XEN
/* Per cpu memory for storing cpu states in case of system crash. */
note_buf_t* crash_notes;
+#endif
int dump_after_notifier;
/* vmcoreinfo stuff */
@@ -1168,6 +1170,7 @@ static void final_note(u32 *buf)
memcpy(buf, &note, sizeof(note));
}
+#ifndef CONFIG_XEN
void crash_save_cpu(struct pt_regs *regs, int cpu)
{
struct elf_prstatus prstatus;
@@ -1193,6 +1196,7 @@ void crash_save_cpu(struct pt_regs *regs
&prstatus, sizeof(prstatus));
final_note(buf);
}
+#endif
#ifdef CONFIG_SYSCTL
static ctl_table dump_after_notifier_table[] = {
@@ -1220,6 +1224,7 @@ static ctl_table kexec_sys_table[] = {
static int __init crash_notes_memory_init(void)
{
+#ifndef CONFIG_XEN
/* Allocate memory for saving cpu registers. */
crash_notes = alloc_percpu(note_buf_t);
if (!crash_notes) {
@@ -1227,6 +1232,7 @@ static int __init crash_notes_memory_ini
" states failed\n");
return -ENOMEM;
}
+#endif
#ifdef CONFIG_SYSCTL
register_sysctl_table(kexec_sys_table);
#endif
--- sle11sp1-2010-03-11.orig/mm/memory.c 2010-03-11 09:13:00.000000000 +0100
+++ sle11sp1-2010-03-11/mm/memory.c 2010-03-01 14:27:31.000000000 +0100
@@ -848,10 +848,12 @@ static unsigned long zap_pte_range(struc
page->index > details->last_index))
continue;
}
+#ifdef CONFIG_XEN
if (unlikely(vma->vm_ops && vma->vm_ops->zap_pte))
ptent = vma->vm_ops->zap_pte(vma, addr, pte,
tlb->fullmm);
else
+#endif
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
tlb_remove_tlb_entry(tlb, pte, addr);
--- sle11sp1-2010-03-11.orig/mm/mmap.c 2010-03-11 09:13:04.000000000 +0100
+++ sle11sp1-2010-03-11/mm/mmap.c 2010-03-11 09:13:24.000000000 +0100
@@ -1844,8 +1844,10 @@ static void unmap_region(struct mm_struc
static inline void unmap_vma(struct vm_area_struct *vma)
{
+#ifdef CONFIG_XEN
if (unlikely(vma->vm_ops && vma->vm_ops->unmap))
vma->vm_ops->unmap(vma);
+#endif
}
/*
@@ -2158,8 +2160,10 @@ void exit_mmap(struct mm_struct *mm)
arch_exit_mmap(mm);
+#ifdef CONFIG_XEN
for (vma = mm->mmap; vma; vma = vma->vm_next)
unmap_vma(vma);
+#endif
vma = mm->mmap;
if (!vma) /* Can happen if dup_mmap() received an OOM */

54
xen3-fixup-kconfig Normal file
View file

@ -0,0 +1,54 @@
Subject: Fix xen configuration.
From: jbeulich@novell.com
Patch-mainline: obsolete
--- head-2009-12-16.orig/arch/x86/Kconfig 2009-12-04 10:44:40.000000000 +0100
+++ head-2009-12-16/arch/x86/Kconfig 2009-10-15 11:53:21.000000000 +0200
@@ -158,6 +158,7 @@ config HAVE_CPUMASK_OF_CPU_MAP
config ARCH_HIBERNATION_POSSIBLE
def_bool y
+ depends on !XEN
config ARCH_SUSPEND_POSSIBLE
def_bool y
--- head-2009-12-16.orig/arch/x86/Kconfig.debug 2009-12-04 10:44:40.000000000 +0100
+++ head-2009-12-16/arch/x86/Kconfig.debug 2009-10-15 11:53:21.000000000 +0200
@@ -289,7 +289,7 @@ config OPTIMIZE_INLINING
config KDB
bool "Built-in Kernel Debugger support"
- depends on DEBUG_KERNEL
+ depends on DEBUG_KERNEL && !XEN
select KALLSYMS
select KALLSYMS_ALL
help
--- head-2009-12-16.orig/drivers/xen/Kconfig 2009-12-04 10:44:40.000000000 +0100
+++ head-2009-12-16/drivers/xen/Kconfig 2009-12-18 12:08:28.000000000 +0100
@@ -22,6 +22,7 @@ config XEN_PRIVILEGED_GUEST
config XEN_UNPRIVILEGED_GUEST
def_bool !XEN_PRIVILEGED_GUEST
+ select PM
config XEN_PRIVCMD
def_bool y
@@ -116,7 +117,7 @@ config XEN_NETDEV_LOOPBACK
config XEN_PCIDEV_BACKEND
tristate "PCI-device backend driver"
- depends on PCI && XEN_BACKEND
+ depends on PCI && XEN_PRIVILEGED_GUEST && XEN_BACKEND
default XEN_BACKEND
help
The PCI device backend driver allows the kernel to export arbitrary
@@ -127,8 +128,8 @@ config XEN_PCIDEV_BACKEND
choice
prompt "PCI Backend Mode"
depends on XEN_PCIDEV_BACKEND
- default XEN_PCIDEV_BACKEND_VPCI if !IA64
default XEN_PCIDEV_BACKEND_CONTROLLER if IA64
+ default XEN_PCIDEV_BACKEND_VPCI
config XEN_PCIDEV_BACKEND_VPCI
bool "Virtual PCI"

6379
xen3-fixup-xen Normal file

File diff suppressed because it is too large Load diff

366
xen3-patch-2.6.18 Normal file
View file

@ -0,0 +1,366 @@
From: www.kernel.org
Subject: Linux 2.6.18
Patch-mainline: 2.6.18
Automatically created from "patches.kernel.org/patch-2.6.18" by xen-port-patches.py
Acked-by: jbeulich@novell.com
--- sle11sp1-2010-03-01.orig/arch/x86/Kconfig 2009-10-15 11:53:21.000000000 +0200
+++ sle11sp1-2010-03-01/arch/x86/Kconfig 2010-02-09 16:47:07.000000000 +0100
@@ -63,7 +63,6 @@ config ARCH_DEFCONFIG
config GENERIC_TIME
def_bool y
- depends on !X86_XEN
config GENERIC_CMOS_UPDATE
def_bool y
@@ -1617,7 +1616,7 @@ config KEXEC_JUMP
code in physical address mode via KEXEC
config PHYSICAL_START
- hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP)
+ hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP || XEN)
default "0x1000000"
---help---
This gives the physical address where the kernel is loaded.
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/Makefile 2009-12-04 10:44:45.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/Makefile 2009-11-06 10:45:44.000000000 +0100
@@ -138,5 +138,5 @@ ifeq ($(CONFIG_X86_64),y)
pci-dma_64-$(CONFIG_XEN) += pci-dma_32.o
endif
-disabled-obj-$(CONFIG_XEN) := i8259_$(BITS).o reboot.o smpboot_$(BITS).o
+disabled-obj-$(CONFIG_XEN) := i8253.o i8259_$(BITS).o reboot.o smpboot_$(BITS).o tsc_$(BITS).o
%/head_$(BITS).o %/head_$(BITS).s: $(if $(CONFIG_XEN),EXTRA_AFLAGS,dummy) :=
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/setup64-xen.c 2008-01-28 12:24:19.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/setup64-xen.c 2009-11-06 10:45:44.000000000 +0100
@@ -363,5 +363,7 @@ void __cpuinit cpu_init (void)
fpu_init();
- raw_local_save_flags(kernel_eflags);
+ asm ("pushfq; popq %0" : "=rm" (kernel_eflags));
+ if (raw_irqs_disabled())
+ kernel_eflags &= ~X86_EFLAGS_IF;
}
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/time-xen.c 2010-03-01 14:03:37.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/time-xen.c 2010-03-01 14:27:44.000000000 +0100
@@ -45,7 +45,6 @@
#include <linux/sysdev.h>
#include <linux/bcd.h>
#include <linux/efi.h>
-#include <linux/mca.h>
#include <linux/sysctl.h>
#include <linux/percpu.h>
#include <linux/kernel_stat.h>
@@ -76,8 +75,13 @@
#if defined (__i386__)
#include <asm/i8259.h>
+#include <asm/i8253.h>
+DEFINE_SPINLOCK(i8253_lock);
+EXPORT_SYMBOL(i8253_lock);
#endif
+#define XEN_SHIFT 22
+
int pit_latch_buggy; /* extern */
#if defined(__x86_64__)
@@ -97,10 +101,6 @@ extern unsigned long wall_jiffies;
DEFINE_SPINLOCK(rtc_lock);
EXPORT_SYMBOL(rtc_lock);
-extern struct init_timer_opts timer_tsc_init;
-extern struct timer_opts timer_tsc;
-#define timer_none timer_tsc
-
/* These are peridically updated in shared_info, and then copied here. */
struct shadow_time_info {
u64 tsc_timestamp; /* TSC at last update of time vals. */
@@ -175,24 +175,6 @@ static int __init __permitted_clock_jitt
}
__setup("permitted_clock_jitter=", __permitted_clock_jitter);
-#if 0
-static void delay_tsc(unsigned long loops)
-{
- unsigned long bclock, now;
-
- rdtscl(bclock);
- do {
- rep_nop();
- rdtscl(now);
- } while ((now - bclock) < loops);
-}
-
-struct timer_opts timer_tsc = {
- .name = "tsc",
- .delay = delay_tsc,
-};
-#endif
-
/*
* Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
* yielding a 64-bit result.
@@ -229,14 +211,6 @@ static inline u64 scale_delta(u64 delta,
return product;
}
-#if 0 /* defined (__i386__) */
-int read_current_timer(unsigned long *timer_val)
-{
- rdtscl(*timer_val);
- return 0;
-}
-#endif
-
void init_cpu_khz(void)
{
u64 __cpu_khz = 1000000ULL << 32;
@@ -256,6 +230,7 @@ static u64 get_nsec_offset(struct shadow
return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift);
}
+#ifdef CONFIG_X86_64
static unsigned long get_usec_offset(struct shadow_time_info *shadow)
{
u64 now, delta;
@@ -263,6 +238,7 @@ static unsigned long get_usec_offset(str
delta = now - shadow->tsc_timestamp;
return scale_delta(delta, shadow->tsc_to_usec_mul, shadow->tsc_shift);
}
+#endif
static void __update_wallclock(time_t sec, long nsec)
{
@@ -377,6 +353,8 @@ void rtc_cmos_write(unsigned char val, u
}
EXPORT_SYMBOL(rtc_cmos_write);
+#ifdef CONFIG_X86_64
+
/*
* This version of gettimeofday has microsecond resolution
* and better than microsecond precision on fast x86 machines with TSC.
@@ -515,6 +493,8 @@ int do_settimeofday(struct timespec *tv)
EXPORT_SYMBOL(do_settimeofday);
+#endif
+
static void sync_xen_wallclock(unsigned long dummy);
static DEFINE_TIMER(sync_xen_wallclock_timer, sync_xen_wallclock, 0, 0);
static void sync_xen_wallclock(unsigned long dummy)
@@ -566,11 +546,15 @@ static int set_rtc_mmss(unsigned long no
return retval;
}
+#ifdef CONFIG_X86_64
/* monotonic_clock(): returns # of nanoseconds passed since time_init()
* Note: This function is required to return accurate
* time even in the absence of multiple timer ticks.
*/
unsigned long long monotonic_clock(void)
+#else
+unsigned long long sched_clock(void)
+#endif
{
unsigned int cpu = get_cpu();
struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
@@ -590,9 +574,9 @@ unsigned long long monotonic_clock(void)
return time;
}
+#ifdef CONFIG_X86_64
EXPORT_SYMBOL(monotonic_clock);
-#ifdef __x86_64__
unsigned long long sched_clock(void)
{
return monotonic_clock();
@@ -762,6 +746,89 @@ irqreturn_t timer_interrupt(int irq, voi
return IRQ_HANDLED;
}
+#ifndef CONFIG_X86_64
+
+void tsc_init(void)
+{
+ init_cpu_khz();
+ printk(KERN_INFO "Xen reported: %u.%03u MHz processor.\n",
+ cpu_khz / 1000, cpu_khz % 1000);
+
+ use_tsc_delay();
+}
+
+#include <linux/clocksource.h>
+
+void mark_tsc_unstable(void)
+{
+#ifndef CONFIG_XEN /* XXX Should tell the hypervisor about this fact. */
+ tsc_unstable = 1;
+#endif
+}
+EXPORT_SYMBOL_GPL(mark_tsc_unstable);
+
+static cycle_t xen_clocksource_read(void)
+{
+#ifdef CONFIG_SMP
+ static cycle_t last_ret;
+#ifndef CONFIG_64BIT
+ cycle_t last = cmpxchg64(&last_ret, 0, 0);
+#else
+ cycle_t last = last_ret;
+#define cmpxchg64 cmpxchg
+#endif
+ cycle_t ret = sched_clock();
+
+ if (unlikely((s64)(ret - last) < 0)) {
+ if (last - ret > permitted_clock_jitter
+ && printk_ratelimit()) {
+ unsigned int cpu = get_cpu();
+ struct shadow_time_info *shadow = &per_cpu(shadow_time, cpu);
+
+ printk(KERN_WARNING "clocksource/%u: "
+ "Time went backwards: "
+ "ret=%Lx delta=%Ld shadow=%Lx offset=%Lx\n",
+ cpu, ret, ret - last, shadow->system_timestamp,
+ get_nsec_offset(shadow));
+ put_cpu();
+ }
+ return last;
+ }
+
+ for (;;) {
+ cycle_t cur = cmpxchg64(&last_ret, last, ret);
+
+ if (cur == last || (s64)(ret - cur) < 0)
+ return ret;
+ last = cur;
+ }
+#else
+ return sched_clock();
+#endif
+}
+
+static struct clocksource clocksource_xen = {
+ .name = "xen",
+ .rating = 400,
+ .read = xen_clocksource_read,
+ .mask = CLOCKSOURCE_MASK(64),
+ .mult = 1 << XEN_SHIFT, /* time directly in nanoseconds */
+ .shift = XEN_SHIFT,
+ .is_continuous = 1,
+};
+
+static int __init init_xen_clocksource(void)
+{
+ clocksource_xen.mult = clocksource_khz2mult(cpu_khz,
+ clocksource_xen.shift);
+
+ return clocksource_register(&clocksource_xen);
+}
+
+module_init(init_xen_clocksource);
+
+#endif
+
static void init_missing_ticks_accounting(unsigned int cpu)
{
struct vcpu_register_runstate_memory_area area;
@@ -908,7 +975,7 @@ static void setup_cpu0_timer_irq(void)
VIRQ_TIMER,
0,
timer_interrupt,
- SA_INTERRUPT,
+ IRQF_DISABLED|IRQF_TIMER,
"timer0",
NULL);
BUG_ON(per_cpu(timer_irq, 0) < 0);
@@ -950,11 +1017,11 @@ void __init time_init(void)
update_wallclock();
+#ifdef CONFIG_X86_64
init_cpu_khz();
printk(KERN_INFO "Xen reported: %u.%03u MHz processor.\n",
cpu_khz / 1000, cpu_khz % 1000);
-#if defined(__x86_64__)
vxtime.mode = VXTIME_TSC;
vxtime.quot = (1000000L << 32) / vxtime_hz;
vxtime.tsc_quot = (1000L << 32) / cpu_khz;
@@ -1129,7 +1196,7 @@ int __cpuinit local_setup_timer(unsigned
irq = bind_virq_to_irqhandler(VIRQ_TIMER,
cpu,
timer_interrupt,
- SA_INTERRUPT,
+ IRQF_DISABLED|IRQF_TIMER,
timer_name[cpu],
NULL);
if (irq < 0)
--- sle11sp1-2010-03-01.orig/drivers/char/agp/intel-agp.c 2010-01-20 10:22:01.000000000 +0100
+++ sle11sp1-2010-03-01/drivers/char/agp/intel-agp.c 2010-01-20 10:22:47.000000000 +0100
@@ -409,6 +409,10 @@ static struct page *i8xx_alloc_pages(voi
if (set_pages_uc(page, 4) < 0) {
set_pages_wb(page, 4);
+#ifdef CONFIG_XEN
+ xen_destroy_contiguous_region((unsigned long)page_address(page),
+ 2);
+#endif
__free_pages(page, 2);
return NULL;
}
--- sle11sp1-2010-03-01.orig/drivers/xen/console/console.c 2009-03-18 10:39:31.000000000 +0100
+++ sle11sp1-2010-03-01/drivers/xen/console/console.c 2009-11-06 10:45:44.000000000 +0100
@@ -94,7 +94,6 @@ static int __init xencons_setup(char *st
{
char *q;
int n;
- extern int console_use_vt;
console_use_vt = 1;
if (!strncmp(str, "ttyS", 4)) {
--- sle11sp1-2010-03-01.orig/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-06-23 09:28:21.000000000 +0200
+++ sle11sp1-2010-03-01/arch/x86/include/mach-xen/asm/pgtable_64.h 2009-11-06 10:45:44.000000000 +0100
@@ -394,7 +394,6 @@ static inline int pmd_large(pmd_t pte) {
/*
* Level 4 access.
- * Never use these in the common code.
*/
#define pgd_page(pgd) ((unsigned long) __va(pgd_val(pgd) & PTE_MASK))
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1))
--- sle11sp1-2010-03-01.orig/arch/x86/include/mach-xen/asm/processor_32.h 2008-01-28 12:24:19.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/include/mach-xen/asm/processor_32.h 2009-11-06 10:45:44.000000000 +0100
@@ -23,7 +23,7 @@
#include <xen/interface/physdev.h>
/* flag for disabling the tsc */
-extern int tsc_disable;
+#define tsc_disable 0
struct desc_struct {
unsigned long a,b;
--- sle11sp1-2010-03-01.orig/arch/x86/include/asm/thread_info.h 2010-03-01 14:09:07.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/include/asm/thread_info.h 2010-02-09 16:47:00.000000000 +0100
@@ -143,11 +143,15 @@ struct thread_info {
(_TIF_SIGPENDING|_TIF_MCE_NOTIFY|_TIF_NOTIFY_RESUME)
/* flags to check in __switch_to() */
+#ifndef CONFIG_XEN
#define _TIF_WORK_CTXSW \
(_TIF_IO_BITMAP|_TIF_DEBUGCTLMSR|_TIF_DS_AREA_MSR|_TIF_NOTSC)
#define _TIF_WORK_CTXSW_PREV _TIF_WORK_CTXSW
#define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW|_TIF_DEBUG)
+#else
+#define _TIF_WORK_CTXSW _TIF_DEBUG
+#endif
#define PREEMPT_ACTIVE 0x10000000

12710
xen3-patch-2.6.19 Normal file

File diff suppressed because it is too large Load diff

7165
xen3-patch-2.6.20 Normal file

File diff suppressed because it is too large Load diff

5056
xen3-patch-2.6.21 Normal file

File diff suppressed because it is too large Load diff

7483
xen3-patch-2.6.22 Normal file

File diff suppressed because it is too large Load diff

5273
xen3-patch-2.6.23 Normal file

File diff suppressed because it is too large Load diff

8553
xen3-patch-2.6.24 Normal file

File diff suppressed because it is too large Load diff

28760
xen3-patch-2.6.25 Normal file

File diff suppressed because it is too large Load diff

20824
xen3-patch-2.6.26 Normal file

File diff suppressed because it is too large Load diff

26166
xen3-patch-2.6.27 Normal file

File diff suppressed because it is too large Load diff

24226
xen3-patch-2.6.28 Normal file

File diff suppressed because it is too large Load diff

11739
xen3-patch-2.6.29 Normal file

File diff suppressed because it is too large Load diff

18363
xen3-patch-2.6.30 Normal file

File diff suppressed because it is too large Load diff

7169
xen3-patch-2.6.31 Normal file

File diff suppressed because it is too large Load diff

6497
xen3-patch-2.6.32 Normal file

File diff suppressed because it is too large Load diff

93
xen3-patch-2.6.32.1-2 Normal file
View file

@ -0,0 +1,93 @@
From: Greg Kroah-Hartman <gregkh@suse.de>
Subject: Linux 2.6.32.2
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Automatically created from "patches.kernel.org/patch-2.6.32.1-2" by xen-port-patches.py
--- head-2010-01-04.orig/arch/x86/kernel/pci-dma-xen.c 2009-11-18 14:54:16.000000000 +0100
+++ head-2010-01-04/arch/x86/kernel/pci-dma-xen.c 2010-01-04 12:50:03.000000000 +0100
@@ -268,7 +268,7 @@ static __init int iommu_setup(char *p)
if (!strncmp(p, "allowdac", 8))
forbid_dac = 0;
if (!strncmp(p, "nodac", 5))
- forbid_dac = -1;
+ forbid_dac = 1;
if (!strncmp(p, "usedac", 6)) {
forbid_dac = -1;
return 1;
--- head-2010-01-04.orig/arch/x86/kernel/setup-xen.c 2009-11-18 14:54:16.000000000 +0100
+++ head-2010-01-04/arch/x86/kernel/setup-xen.c 2010-01-04 12:50:03.000000000 +0100
@@ -109,6 +109,7 @@
#ifdef CONFIG_X86_64
#include <asm/numa_64.h>
#endif
+#include <asm/mce.h>
#ifdef CONFIG_XEN
#include <asm/hypervisor.h>
@@ -1260,6 +1261,8 @@ void __init setup_arch(char **cmdline_p)
#endif
#endif /* CONFIG_XEN */
x86_init.oem.banner();
+
+ mcheck_intel_therm_init();
}
#ifdef CONFIG_X86_32
--- head-2010-01-04.orig/drivers/xen/blktap2/sysfs.c 2009-12-09 16:14:04.000000000 +0100
+++ head-2010-01-04/drivers/xen/blktap2/sysfs.c 2010-01-04 12:54:11.000000000 +0100
@@ -39,11 +39,11 @@ blktap_sysfs_exit(struct blktap *tap)
static ssize_t blktap_sysfs_pause_device(struct device *,
struct device_attribute *,
const char *, size_t);
-DEVICE_ATTR(pause, S_IWUSR, NULL, blktap_sysfs_pause_device);
+static DEVICE_ATTR(pause, S_IWUSR, NULL, blktap_sysfs_pause_device);
static ssize_t blktap_sysfs_resume_device(struct device *,
struct device_attribute *,
const char *, size_t);
-DEVICE_ATTR(resume, S_IWUSR, NULL, blktap_sysfs_resume_device);
+static DEVICE_ATTR(resume, S_IWUSR, NULL, blktap_sysfs_resume_device);
static ssize_t
blktap_sysfs_set_name(struct device *dev, struct device_attribute *attr,
@@ -103,8 +103,8 @@ blktap_sysfs_get_name(struct device *dev
return size;
}
-DEVICE_ATTR(name, S_IRUSR | S_IWUSR,
- blktap_sysfs_get_name, blktap_sysfs_set_name);
+static DEVICE_ATTR(name, S_IRUSR | S_IWUSR,
+ blktap_sysfs_get_name, blktap_sysfs_set_name);
static ssize_t
blktap_sysfs_remove_device(struct device *dev, struct device_attribute *attr,
@@ -123,7 +123,7 @@ blktap_sysfs_remove_device(struct device
return (err ? : size);
}
-DEVICE_ATTR(remove, S_IWUSR, NULL, blktap_sysfs_remove_device);
+static DEVICE_ATTR(remove, S_IWUSR, NULL, blktap_sysfs_remove_device);
static ssize_t
blktap_sysfs_pause_device(struct device *dev, struct device_attribute *attr,
@@ -293,7 +293,7 @@ out:
return ret;
}
-DEVICE_ATTR(debug, S_IRUSR, blktap_sysfs_debug_device, NULL);
+static DEVICE_ATTR(debug, S_IRUSR, blktap_sysfs_debug_device, NULL);
int
blktap_sysfs_create(struct blktap *tap)
--- head-2010-01-04.orig/drivers/xen/xenbus/xenbus_probe.c 2009-11-06 10:52:23.000000000 +0100
+++ head-2010-01-04/drivers/xen/xenbus/xenbus_probe.c 2010-01-04 12:52:55.000000000 +0100
@@ -562,7 +562,7 @@ static ssize_t xendev_show_modalias(stru
{
return sprintf(buf, "xen:%s\n", to_xenbus_device(dev)->devicetype);
}
-DEVICE_ATTR(modalias, S_IRUSR | S_IRGRP | S_IROTH, xendev_show_modalias, NULL);
+static DEVICE_ATTR(modalias, S_IRUSR | S_IRGRP | S_IROTH, xendev_show_modalias, NULL);
int xenbus_probe_node(struct xen_bus_type *bus,
const char *type,

19
xen3-patch-2.6.32.2-3 Normal file
View file

@ -0,0 +1,19 @@
From: Greg Kroah-Hartman <gregkh@suse.de>
Subject: Linux 2.6.32.3
Patch-mainline: 2.6.32.3
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Automatically created from "patches.kernel.org/patch-2.6.32.2-3" by xen-port-patches.py
--- sle11sp1-2010-03-11.orig/arch/x86/include/mach-xen/asm/processor.h 2010-03-17 14:36:55.000000000 +0100
+++ sle11sp1-2010-03-11/arch/x86/include/mach-xen/asm/processor.h 2010-03-17 14:37:31.000000000 +0100
@@ -181,7 +181,7 @@ static inline void xen_cpuid(unsigned in
unsigned int *ecx, unsigned int *edx)
{
/* ecx is often an input as well as an output. */
- asm(XEN_CPUID
+ asm volatile(XEN_CPUID
: "=a" (*eax),
"=b" (*ebx),
"=c" (*ecx),

19
xen3-patch-2.6.32.3-4 Normal file
View file

@ -0,0 +1,19 @@
From: Greg Kroah-Hartman <gregkh@suse.de>
Subject: Linux 2.6.32.4
Patch-mainline: 2.6.32.4
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Automatically created from "patches.kernel.org/patch-2.6.32.3-4" by xen-port-patches.py
--- sle11sp1-2010-01-20.orig/arch/x86/ia32/ia32entry-xen.S 2009-11-06 14:53:39.000000000 +0100
+++ sle11sp1-2010-01-20/arch/x86/ia32/ia32entry-xen.S 2010-01-20 10:28:42.000000000 +0100
@@ -589,7 +589,7 @@ ia32_sys_call_table:
.quad quiet_ni_syscall /* streams2 */
.quad stub32_vfork /* 190 */
.quad compat_sys_getrlimit
- .quad sys32_mmap2
+ .quad sys_mmap_pgoff
.quad sys32_truncate64
.quad sys32_ftruncate64
.quad sys32_stat64 /* 195 */

104
xen3-patch-2.6.32.7-8 Normal file
View file

@ -0,0 +1,104 @@
From: Greg Kroah-Hartman <gregkh@suse.de>
Subject: Linux 2.6.32.8
Patch-mainline: 2.6.32.8
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Automatically created from "patches.kernel.org/patch-2.6.32.7-8" by xen-port-patches.py
--- sle11sp1-2010-03-11.orig/arch/x86/kernel/process-xen.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-11/arch/x86/kernel/process-xen.c 2010-02-09 17:12:56.000000000 +0100
@@ -93,18 +93,6 @@ void flush_thread(void)
{
struct task_struct *tsk = current;
-#ifdef CONFIG_X86_64
- if (test_tsk_thread_flag(tsk, TIF_ABI_PENDING)) {
- clear_tsk_thread_flag(tsk, TIF_ABI_PENDING);
- if (test_tsk_thread_flag(tsk, TIF_IA32)) {
- clear_tsk_thread_flag(tsk, TIF_IA32);
- } else {
- set_tsk_thread_flag(tsk, TIF_IA32);
- current_thread_info()->status |= TS_COMPAT;
- }
- }
-#endif
-
clear_tsk_thread_flag(tsk, TIF_DEBUG);
tsk->thread.debugreg0 = 0;
--- sle11sp1-2010-03-11.orig/arch/x86/kernel/process_64-xen.c 2010-03-17 14:37:05.000000000 +0100
+++ sle11sp1-2010-03-11/arch/x86/kernel/process_64-xen.c 2010-03-17 14:38:41.000000000 +0100
@@ -615,6 +615,17 @@ sys_clone(unsigned long clone_flags, uns
return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
}
+void set_personality_ia32(void)
+{
+ /* inherit personality from parent */
+
+ /* Make sure to be in 32bit mode */
+ set_thread_flag(TIF_IA32);
+
+ /* Prepare the first "return" to user space */
+ current_thread_info()->status |= TS_COMPAT;
+}
+
unsigned long get_wchan(struct task_struct *p)
{
unsigned long stack;
--- sle11sp1-2010-03-11.orig/arch/x86/kernel/quirks-xen.c 2009-11-06 10:52:23.000000000 +0100
+++ sle11sp1-2010-03-11/arch/x86/kernel/quirks-xen.c 2010-02-09 17:12:56.000000000 +0100
@@ -492,6 +492,19 @@ void force_hpet_resume(void)
break;
}
}
+
+/*
+ * HPET MSI on some boards (ATI SB700/SB800) has side effect on
+ * floppy DMA. Disable HPET MSI on such platforms.
+ */
+static void force_disable_hpet_msi(struct pci_dev *unused)
+{
+ hpet_msi_disable = 1;
+}
+
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
+ force_disable_hpet_msi);
+
#endif
#if defined(CONFIG_PCI) && defined(CONFIG_NUMA)
--- sle11sp1-2010-03-11.orig/arch/x86/kernel/setup-xen.c 2010-01-04 12:50:03.000000000 +0100
+++ sle11sp1-2010-03-11/arch/x86/kernel/setup-xen.c 2010-02-09 17:12:56.000000000 +0100
@@ -736,19 +736,27 @@ static struct dmi_system_id __initdata b
DMI_MATCH(DMI_BIOS_VENDOR, "Phoenix/MSC"),
},
},
- {
/*
- * AMI BIOS with low memory corruption was found on Intel DG45ID board.
- * It hase different DMI_BIOS_VENDOR = "Intel Corp.", for now we will
+ * AMI BIOS with low memory corruption was found on Intel DG45ID and
+ * DG45FC boards.
+ * It has a different DMI_BIOS_VENDOR = "Intel Corp.", for now we will
* match only DMI_BOARD_NAME and see if there is more bad products
* with this vendor.
*/
+ {
.callback = dmi_low_memory_corruption,
.ident = "AMI BIOS",
.matches = {
DMI_MATCH(DMI_BOARD_NAME, "DG45ID"),
},
},
+ {
+ .callback = dmi_low_memory_corruption,
+ .ident = "AMI BIOS",
+ .matches = {
+ DMI_MATCH(DMI_BOARD_NAME, "DG45FC"),
+ },
+ },
#endif
{}
};

18
xen3-patch-2.6.32.8-9 Normal file
View file

@ -0,0 +1,18 @@
From: Greg Kroah-Hartman <gregkh@suse.de>
Subject: Linux 2.6.32.9
Patch-mainline: 2.6.32.9
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Automatically created from "patches.kernel.org/patch-2.6.32.8-9" by xen-port-patches.py
--- sle11sp1-2010-03-01.orig/arch/x86/kernel/apic/io_apic-xen.c 2009-11-06 10:52:22.000000000 +0100
+++ sle11sp1-2010-03-01/arch/x86/kernel/apic/io_apic-xen.c 2010-03-01 14:44:43.000000000 +0100
@@ -3267,6 +3267,7 @@ unsigned int create_irq_nr(unsigned int
continue;
desc_new = move_irq_desc(desc_new, node);
+ cfg_new = desc_new->chip_data;
if (__assign_irq_vector(new, cfg_new, apic->target_cpus()) == 0)
irq = new;

Some files were not shown because too many files have changed in this diff Show more